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I . 


INTRODUCTION 


Since  1929,  when  Edwin  A.  Link  produced  the  first  U.S.- 
built  synthetic  trainer  designed  to  teach  people  how  to  fly, 
flight  simulation  has  witnessed  substantial  advances  in  simu¬ 
lation  technology  and  increased  incorporation  of  these  devices 
into  both  military  and  civilian  aviation  training  programs. 

A  recent  addition  in  1980  to  this  advance  was  the  A-6E  Weapon 
System  Trainer  (WST) ,  device  2F114.  A  sophisticated  flight 
simulator  with  a  six  degree  of  freedom  motion  system,  the  A-6E 
WST  was  designed  to  provide  the  capability  for  pilot  transition 
training,  Bombardier/Navigator  (B/N)  transition  training,  in¬ 
tegrated  crew  training,  and  maintenance  of  flight  and  weapon 
system  proficiency  in  all  non-visual  elements  of  the  A-6E 
Carrier  Aircraft  Inertial  Navigation  System  (CAINS)  mission. 

The  development  of  high-fidelity  flight  simulation  has  been 
accompanied  by  advances  in  aircrew  performance  measurement 
systems,  which  are  ideal  for  the  simulator  training  environ¬ 
ment,  and  have  been  widely  implemented  and  the  subject  of 
extensive  research  in  all  three  military  aviation  communities. 

The  purpose  of  this  thesis  is  to  design  a  system  to 
improve  current  performance  measurement  techniques  for  the 
B/N  Fleet  Replacement  Squadron  (FRS)  student  by  the  develop¬ 
ment  and  application  of  a  performance  measurement  system  that 
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incorporates  the  advantages  of  both  objective  skill  acquisition 
measures  and  subjective  instructor  measurement. 

A.  BACKGROUND 

While  simulation  has  been  a  popular  component  of  many 
aviation  training  systems  for  over  forty  years,  objective 
performance  assessment  had  not  been  incorporated  until  some 
fifteen  years  ago.  This  section  will  discuss  the  current  state 
of  flight  simulation,  aircrew  performance  measurement,  and 
provide  a  brief  review  of  previous  navigator  performance  meas¬ 
urement  studies. 

1 .  Simulation  and  Performance  Measurement 

Simulation  is  the  technique  of  reproducing  or  imitating 
some  system  operation  in  a  highly-controlled  environment. 

Modern  flight  simulators  have  evolved  from  simple  procedure 
trainers  into  devices  that  represent  specific  aircraft  counter¬ 
parts,  and  imitate  or  duplicate  on-board  systems  and  environ¬ 
mental  factors.  The  two  main  purposes  of  flight  simulators 
within  the  training  environment  are  training  and  evaluation. 
Training  is  designed  to  improve  performance  and  some  means  of 
providing  feedback  to  the  student  is  needed  to  indicate  the 
adequacy  of  his  behavior,  and  ought  to  provide  guidance  for 
the  correction  of  inappropriate  response  patterns.  Evaluation 
involves  testing  and  recording  the  student's  behavior  in  the 
performance  examination  situation.  [Angell,  et  al.,  1964]. 

The  reasons  for  using  the  simulator  as  an  integral 
part  of  a  military  flight  training  program  were  examined  by 
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Tindle  [1979],  Shelnutt,  et  al .  [1980],  North  and  Griffin 

[1977] ,  and  Roscoe  [1976] .  Some  basic  justifications  for 
simulator  training  include: 

(1)  Simulation  provides  training  in  skill  areas  not 
adaptable  to  an  actual  training  flight  because  of 
technological  or  safety  considerations. 

(2)  Crews  master  skills  in  the  aircraft  in  less  time 
after  learning  those  skills  in  a  simulator. 

(3)  The  cultivation  of  decision-making  skills  is  an 
instructional  objective  calling  for  situational 
training  that  may  be  carried  out  safely  only  in  a 
simulated  tactical  environment. 

(4)  Simulators  are  effective  for  training  crewmembers 
of  varying  experience  and  expertise  in  a  variety 
of  aircraft  for  a  number  of  flight  tasks. 

(5)  Greater  objectivity  is  obtainable  for  measuring 
student  performance  by  using  controlled  conditions 
and  automated  performance  measurement  features  in 
the  simulator  than  in  the  aircraft. 

(6)  Instructors  are  not  distracted  by  operational 
constraints  in  the  simulator  and  are  more  available 
for  teaching  and  evaluation  roles. 

These  considerations  are  by  no  means  exhaustive,  but 
they  do  indicate  the  utility  of  simulators  in  flight  training 
programs,  especially  in  evaluating  student  performance. 

This  thesis  is  addressed  primarily  to  the  problem  of 
measuring  B/N  performance  during  a  radar  navigation  training 
flight  while  in  the  A-6E  WST.  The  performance  of  a  B/N  is 
the  exerted  effort  (physical  or  mental)  combined  with  internal 
ability  to  accomplish  the  A-6E  mission  and  its  functions.  Some 
development  of  performance  measurement  definitions  and  goals 
is  necessary  because  the  problem  of  assessing  B/N  performance 
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during  a  training  program  is  obviously  but  a  segment  of  a 
broader  topic  -  the  measurement  of  human  behavior. 

Glaser  and  Klaus  [1966]  defined  performance  evaluation 
as  the  assessment  of  criterion  behavior,  or  the  determination 
of  the  characteristics  of  present  performance  or  output  in 
terms  of  specified  standards.  The  importance  of  defining 
performance  evaluation  is  paramount  to  any  training  assessment 
situation,  as  it  gives  common  ground  to  operationally  describ¬ 
ing  the  human  behaviors  that  make  up  performance  itself,  and 
identifies  behavior  elements  that  may  be  measured  by  either 
objective  or  subjective  means.  An  expanded  discussion  of 
performance  measurement  and  evaluation  can  be  found  in  Chapter 


IV. 


The  purposes  for  assessing  performance  by  the  appli¬ 
cation  of  standard  objective  measurement  operations  were 
stated  by  Angell,  et  al.  [1964],  and  Riis  [1966]: 

(1)  Achievement  -  to  determine  the  adequacy  with  which 
an  activity  can  be  performed  at  the  present  time, 
without  regard,  necessarily,  for  antecedent  events 
or  circumstances. 

(2)  Aptitude  -  to  predict  the  level  of  proficiency  at 
which  a  person  might  perform  some  activity  in  the 
future  if  he  were  given  instructions  concerning 
the  activity. 

(3)  Treatment  efficacy  -  to  observe  the  effects  upon 
performance  of  variation  in  some  independent  cir¬ 
cumstances  such  as  (a)  instructional  techniques, 

(b)  curriculum  content,  (c)  selection  standards, 

(d)  equipment  configurations,  or  the  like. 

The  flight  simulator  training  environment  allows  for  special 

applications  of  the  above  as  found  in  Danneskiold  [1955] , 

Angell,  et  al.  [1964] ,  Riis  [1966] ,  Glaser  and  Klaus  [1966] , 
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Farrell  [1974],  Shipley  [1976],  and  McDowell  [1978]: 

(1)  Diagnostic  -  determine  strong  and  weak  areas  of 
student  proficiency. 

(2)  Readiness  -  determine  operational  readiness  of  an 
aviation  unit. 

(3)  Discrimination  -  assess  performance  to  provide  in¬ 
formation  about  an  individual's  present  behavior 
as  compared  to  other  individuals. 

(4)  Selection  -  of  persons  for  promotion  or  advancement 
or  placement. 

(5)  Learning  rates  -  determining  the  rate  at  which 
learning  takes  place. 

(6)  Management  -  of  an  entire  training  program  and  its 
subsystems . 

(7)  Evaluation  -  of  training  devices  in  terms  of 
effectiveness  and  transf er-of-training. 

The  above  goals  of  performance  measurement  represent 
some  of  the  major  reasons  why  assessment  of  student  performance 
is  important  in  training  program  simulators.  Most  importantly, 
measurement  provides  FRS  instructors  and  training  officers 
with  the  information  needed  to  make  correct  decisions 
[Obermayer,  et  al. ,  1974;  Vreuls  and  Wooldridge,  1977].  Per¬ 
formance  measurement  does  not  in  itself  replace  the  decision¬ 
maker  in  the  FRS,  but  instead  provides  complete  and  necessary 
information  of  an  objective  nature  to  the  appropriate  evaluator 
(instructor  or  training  officer) ,  so  that  more  accurate  and 
reliable  decisions  can  be  made  concerning  student  progress 
within  the  training  syllabus.  If  instructors  and  training 
officers  utilize  the  potential  of  a  performance  measurement 
system,  a  more  effective  and  efficient  training  program  would 
be  a  result. 
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2.  Review  of  Previous  Studies 


Most  of  the  literature  on  aircrew  performance  measure¬ 
ment  in  the  last  forty  years  has  primarily  concentrated  on  the 
pilot  crewmember  [Ericksen,  1952;  Danneskiold,  1955;  Smode, 
et  al.,  1962;  Buckhout,  1962;  Obermayer  and  Muckier,  1964; 

Mixon  and  Moroney,  1981] .  The  first  comprehensive  evaluation 
of  techniques  used  in  flight  grading  was  by  Johnson  and  Boots 
[1943] ,  who  analyzed  ratings  given  by  instructors  and  inspec¬ 
tors  to  students  on  various  maneuvers  throughout  stages  of 
training.  One  result  showed  correlations  between  grades 
assigned  by  different  raters  to  the  same  subject  as  being  very 
low.  This  result  of  low  observer-observer  reliability  when 
using  subjective  ratings  will  be  discussed  in  the  next  section. 

The  earliest  studies  involving  the  radar  navigation 
performance  of  a  crewmember  other  than  the  pilot  were  a  pen 
and  pencil  radar  scope  interpretation  experiment  by  Beverly 
[1952] ,  and  two  Air  Force  radar  bombing  error  projects  by 
Voiers  [1954],  and  Daniel  and  Eason  [1954].  The  first  study 
was  concerned  with  constructing  a  suitable  test  for  the  meas¬ 
urement  of  navigational  radar  scope  interpretation  ability  of 
student  aircraft  observers.  The  latter  two  studies  were  con¬ 
cerned  with  identifying  perceptual  factors  which  contributed 
to  cross-hair  error  during  bomb  runs  of  a  radar  bombing  mission 
and  with  comparing  the  components  of  simulated  radar  bombing 
error  in  terms  of  reliability  and  sensitivity  to  practice, 
respectively.  A  similar  follow-up  study  on  radar  scope 
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interpretation  and  operator  performance  in  finding  and  iden¬ 
tifying  targets  using  a  radar  was  performed  by  Williams,  et 
al.  [1960] .  These  four  studies  represent  most  of  the  research 
of  non-pilot  radar  navigation  performance  measurement  prior 
to  1965.  This  fact  is  not  surprising,  due  mainly  to  the  early 
role  played  by  the  observer  in  very  simplified  and  pilot- 
oriented  aircraft  as  compared  to  today's  specialized  navigator 
in  complex,  computer-oriented  aircraft. 

Since  navigation  is  a  primary  duty  of  any  aviator 
across  a  spectrum  of  aircraft  types,  some  helicopter  pilot 
and  copilot  studies  are  of  some  value  to  review.  Helicopter 
Nap-of-the-Earth  (NOE)  flight  is  a  visual-dominated  low  level 
mission  where  altitude  and  airspeed  are  variable  in  close  prox¬ 
imity  to  the  ground.  Some  navigational  performance  measures 
utilized  in  these  studies  were:  number  of  turn  points  found, 
probability  of  finding  a  turn  point,  and  route  excursions 
beyond  a  criterion  distance  [Fineberg,  1974;  Farrell  and 
Fineberg,  1976;  Fineberg,  et  al.,  1978;  Smith,  1980].  Low 
level  visual  navigation  flights  in  helicopters  were  also 
studied  in  some  detail,  again  with  pilot  performance  being 
the  main  concern  [Lewis,  1966;  Billings,  et  al.,  1968; 

Sanders,  et  al.,  1979]. 

Two  rather  novel  investigations  of  Anti-Submarine 
Warfare  (ASW)  helicopter  team  performance  using  the  content 
and  flow  of  team  communications  during  simulated  attacks  were 
done  by  Federman  and  Siegel  [1965]  and  Siegel  and  Federman  [1968]. 
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All  team  communications  were  recorded,  classified,  and  com¬ 
pared  to  target  miss  distance  as  an  effectiveness  measure. 
Although  they  found  some  types  of  team  communication  to 
correlate  highly  with  mission  success,  their  method  was 
highly  impractical  for  the  operational  situation  due  to  the 
large  number  of  personnel  needed  to  play  back  and  classify 
the  communication  types.  Nevertheless,  the  results  from 
this  research  indicate  the  value  of  using  crew  communication 
as  a  measure  of  crew  performance. 

Several  fixed-wing  studies  with  navigation  as  the 
primary  mission  are  also  of  interest  to  the  current  study. 
Schohan,  et  al.  [1965]  and  Soliday  [1970]  used  the  Dynamic 
Flight  Simulator  for  several  Low  Altitude  High  Speed  (LAHS) 
missions  designed  to  investigate  pilot  and  observer  performance 
during  turbulent,  lengthy  (3-hour)  flights.  Jensen,  et  al. 
[1972]  did  several  studies  investigating  pilotage  errors  in 
area  navigation  missions  for  the  Federal  Aviation  Administra¬ 
tion.  These  three  studies  are  significant  in  that  numerous 
navigational  accuracy  performance  measures  were  used  to  assess 
pilot  (or  observer)  performance. 

After  1970,  due  to  the  increased  complexity  of  many 
modern  aircraft,  more  research  was  directed  toward  individual 
aircrew  members,  and  not  just  the  pilot.  Among  the  aircraft 
investigated  were:  P-3C,  F-4J,  A-7,  F-106,  B-52,  C-141,  C-130, 
KC-135,  and  the  C-5  [Matheny,  et  al. ,  1970;  Vreuls  and  Ober- 
mayer,  1971;  Obermayer  and  Vreuls,  1974;  Geiselhart,  et  al. , 
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1976;  Swink,  et  al.,  1978],  These  studies  are  unique  in 
that  defining  and  assessing  aircrew  performance  by  other  than 
the  subjective  ratings,  as  commonly  used  for  decades,  became 
a  technological  challenge  requiring  new  analytical  and  empir¬ 
ical  approaches. 

An  Air  Force  fighter-bomber,  the  F-111D,  was  designed 
and  built  during  the  late  1960's  with  virtually  the  same  tac¬ 
tical  capability  of  the  A-6E.  With  a  two-man  side-by-side 
cockpit  arrangement,  this  land-based  aircraft  is  the  closest 
counterpart  to  the  A-6E  for  the  radar  navigation  air  inter¬ 
diction  mission.  Two  experiments  using  the  F-111A  flight 
simulator  were  performed  mainly  for  equipment  configuration 
effects  on  pilot  performance  [Geiselhart,  et  al.,  1970; 
Geiselhart,  et  al.,  1971].  Research  by  Jones  [1976]  examined 
the  use  of  the  F-111D  flight  simulator  as  an  aircrew  perfor¬ 
mance  evaluation  device.  Unfortunately,  these  F-lll  studies 
do  not  specifically  address  the  issue  of  how  to  measure  navi¬ 
gator  performance  during  radar  navigation,  but  do  provide 
some  information  on  measuring  performance  in  an  aircraft  with 
a  similar  mission  and  almost  the  same  crew  interactions  as 
the  A-6E. 

Only  one  experiment  known  to  this  author  has  been 
conducted  using  an  A-6  configured  simulation.  Klier  and  Gage 
[1970]  investigated  the  effect  of  different  simulation  motion 
conditions  on  pilots  flying  air-to-air  gunnery  tracking  tasks 
in  the  Grumman  Research  Vehicle  Motion  Simulator  (RVMS) . 
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They  concluded  that  simulator  motion  need  not  be  a  faithful 
reproduction  of  real-life  motion  in  order  to  provide  essential 
motion  cues.  Saleh,  et  al.  [1980]  performed  an  analytical 
study  for  two  typical  tactical  combat  missions  representative 
of  the  A-6E  and  A-7E  aircraft  to  determine  significant  deci¬ 
sions  which  are  made  in  the  course  of  accomplishing  mission 
objectives.  The  results  of  this  study  provide  information 
regarding  the  decision  type,  difficulty,  and  criticality  and 
can  be  used  in  identifying  the  critical  areas  in  which  aircrew 
decision-aiding  may  significantly  improve  performance. 

Finally,  a  study  by  Tindle  [1979]  concluded  that  the  integra¬ 
tion  of  the  A-6E  WST  (device  2F114)  into  the  FRS  training 
program  would  be  more  cost-effective  than  using  the  A-6E  WST 
as  an  addition  to  existing  training  programs.  This  study 
also  concluded  that  aircrew  performance  measurement  in  the 
A-6E  WST  was  vital  for  more  effective  use  of  the  simulator. 

This  section  has  presented  a  brief  review  of  aircrew 
performance  measurement  studies  in  the  literature.  The  po¬ 
tential  value  in  reviewing  the  literature  lies  in  uncovering 
the  analytical  and  empirical  approaches  taken  in  measuring 
aircrew  performance,  noting  both  the  significance  and  practi¬ 
cality  of  those  approaches.  Since  previous  research  on  actual 
B/N  performance  during  the  radar  navigation  air  interdiction 
mission  appears  to  be  nonexistent,  extrapolations  from  other 
aircrew  performance  measurement  studies  is  important  and 
necessary  to  the  current  study.  What  has  worked  and  been 
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practical  for  other  aircraft  and  aircrews  in  the  way  of 
performance  measurement  certainly  applies  to  the  A-6E  B/N, 
keeping  in  mind  the  definitions  and  goals  of  the  A-6E  B/N, 
performance  measurement,  and  the  A-6E  mission. 

B.  SUBJECTIVE  AND  OBJECTIVE  PERFORMANCE  MEASUREMENT 
1.  Introduction 

Traditionally,  all  aircrew  performance  measurement 
in  the  Navy,  Air  Force  and  Army  has  been  assessed  by  an  in¬ 
structor  pilot  or  navigator  using  a  subjective  rating  scale 
which  places  the  student  in  one  of  several  skill  categories 
based  on  norm-referenced  testing.  More  recently,  objective 
methods  of  evaluating  performance  have  been  developed  and 
implemented  in  both  the  simulator  and  in-flight  environments. 
Subjective  and  objective  methods  are  not  dichotomous  but 
represent  a  continuum  of  performance  measurement.  At  one 
extreme  there  exists  the  strictly  personal  judgement  and  rat¬ 
ing  of  performance,  and  on  the  other  end  of  the  continuum  is 
a  completely  automated  performance  measurement  and  assessment 
system. 

This  section  will  define  and  describe  the  elements  of 
each  method  together  with  the  advantages  and  disadvantages 
associated  with  each.  The  approach  taken  in  this  study  will 
be  to  integrate  the  use  of  automatic  performance  measurement 
within  the  A-6E  training  environment  while  still  exploiting 
the  advantages  of  using  the  instructor  as  a  component  of  the 
measuring  system. 
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2. 


Subjective  Performance  Measurement 


Subjective  measurement  can  be  defined  as  an  observer's 
interpretation  or  judgement  between  the  act  observed  and  the 
record  of  its  excellence  with  an  almost  complete  reliance 
placed  on  the  judgement  and  experience  of  the  evaluator 
[Cureton,  1951;  Ericksen,  1952] .  Simply  stated,  subjective 
measurement  is  qualitative  in  nature  as  what  is  being  measured 
is  observed  privately  [Danneskiold ,  1955;  Knoop  and  Welde, 

1973;  Roscoe,  1976;  Vreuls  and  Wooldridge,  1977],  Through  an 
introspective  process,  the  "expert"  instructor  judges  the 
performance  level  demonstrated  by  a  student  whether  or  not 
agreed-upon  standards  of  performance  have  been  applied 
[Billings,  1968;  McDowell,  1978]. 

a.  Advantages  of  Subjective  Measurement 

The  advantages  of  using  subjective  performance 
measurement  methods  have  been  well-documented  throughout  the 
literature.  Instructor  ratings  in  the  past  have  been  the 
least  expensive  of  all  evaluation  methods  [Marks,  1961].  This 
decisive  advantage  has  been  eroded  in  recent  times  by  severe 
shortages  of  military  aircrew  in  both  operational  and  training 
units.  Marks  [1961]  also  pointed  out  that  ratings  forms  are 
constructed  easily  and  quickly,  and  the  administration  of  the 
ratings  system  requires  no  physical  arrangement.  McDowell 
[1978]  recently  concluded  that  a  subjective  performance  meas¬ 
urement  system  for  many  complex  tasks  such  as  flying  were  easy 
to  develop,  gave  the  rating  instructor  high  face  validity  since 
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he  is  usually  an  acknowledged  expert,  and  contained  specific 
feedback  of  a  type  important  in  the  training  situation  usually 
not  found  in  objective  performance  measurement  systems.  Rat¬ 
ings  are  still  used  because  they  meet  the  needs  of  training 
management  without  seriously  intruding  into  the  instructor 
pilot's  operational  capabilities  [Shipley,  1976].  One  impor¬ 
tant  use  of  an  instructor  to  subjectively  grade  a  student  is 
to  motivate  the  student  through  selective  reinforcement 
[Prophet,  1972;  Carter,  1977] .  Sometimes  an  overly  positive 
or  negative  grade  by  the  instructor  in  the  appropriate  area 
of  desired  performance  improvement  for  the  student  serves  as 
a  catalyst  in  the  student's  attitude  toward  self-improvement. 

Some  studies  have  shown  that  some  degree  of  high 
reliability  can  be  achieved  between  instructor  ratings  [Greer, 
et  al.,  1963;  Marks,  1961].  Brictson  [1971],  in  a  study  of 
over  2500  carrier  arrested  landings,  reported  measures  derived 
from  the  Landing  Signal  Officer  (LSO)  grades  to  be  highly  cor¬ 
related  with  objective  estimates  derived  from  a  weighted 
combination  of  wave-offs,  bolters,  and  the  particular  wire 
engaged.  Similar  high  correlations  between  raters  were  also 
found  by  Waag,  et  al.  [1975]  for  undergraduate  pilots  flying 
seven  basic  instrument  maneuvers  in  the  Air  Force  Advanced 
Simulation  in  Undergraduate  Pilot  Training  (ASUPT)  facility, 
b.  Disadvantages  of  Subjective  Measurement 

High  reliability  between  instructors  using  sub¬ 
jective  rating  methods  is  generally  not  the  case  and  therein 
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lies  the  foremost  disadvantage  of  the  subjective  performance 
measurement  method.  Knoop  [1973]  reported  two  instructor 
pilots  (IP's)  subjective  ratings  were  each  correlated  with 
certain  objective  performance  measures,  but  the  objective 
measures  themselves  were  determined  to  be  not  highly  corre¬ 
lated  with  skilled  or  proficient  operator  performance. 

Ericksen  [1952]  reviewed  numerous  flight  studies  involving 
pilot  training  between  1932  and  1952  and  concluded  that  sub¬ 
jective  grading  involved  a  lack  of  reliability  and  inconsistent 
differentiation  between  students.  Danneskiold  [1955]  found 
observer-observer  correlations  no  higher  than  .47  for  three 
Basic  Instrument  Check  maneuvers,  while  a  more  objective  test 
had  observer-observer  correlations  of  .86. 

The  training  and  evaluation  skills  of  the  instruc¬ 
tor  evolve  primarily  from  their  personal  experiences  in  the 
highly  complex  aircraft  and  simulator  environment.  Establish¬ 
ing  adequate  standards  of  performance,  or  criteria,  is  a  major 
problem  in  all  flight  training.  Knoop  and  Welde  [1973]  found 
lack  of  agreement  between  pilots  on  the  specific  criteria  for 
successful  performance  of  certain  aerobatic  maneuvers,  due 
largely  to  the  differences  in  examiner  knowledge,  experience, 
and  proficiency.  It  was  also  found  that  the  same  maneuver  may 
be  flown  satisfactorily  in  a  number  of  different  ways.  Other 
research  has  explored  the  criteria  problem  which  is  inherently 
part  of  subjective  performance  measurement  [Cureton,  1951; 
Danneskiold,  1955;  Marks,  1961;  McDowell,  1978].  Even  rating 
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methods  were  inadequate  when  used  as  criteria  to  validate 
alternative  methods  of  measurement  and  evaluation  [Knoop  and 
Welde,  1973] . 

An  instructor  must  be  able  to  process  large  quan¬ 
tities  of  information  during  a  simulator  session.  Subjective 
grading  competes  with  this  capability  of  the  instructor,  and 
may  prevent  perception  and  evaluation  of  all  the  relevant 
dimensions  of  task  performance  during  a  training  mission 
[Knoop,  1968;  Roscoe,  1976;  Shipley,  1976;  Carter,  1977; 

Vreuls  and  Wooldridge,  1977] . 

Several  other  factors  which  contribute  to  sub¬ 
jective  aircrew  rating  variances  are  discussed  below: 

(1)  A  tendency  of  raters  to  be  more  lenient  in 
evaluating  those  whom  they  know  well  or  are  particularly 
interested  in  [Smode,  et  al.,  1962;  Bowen,  et  al.,  1966]. 

(2)  The  observations  tend  to  accumulate  on  one 
or  two  points  in  the  rating  scale,  usually  near  the  central 
or  average  point.  This  phenomenon  contributes  toward  a  lack 
of  sufficient  discrimination  among  students  [Smode,  et  al. , 
1962;  Shipley,  1976] . 

(3)  Instructor  and  student  personalities  interact 
to  yield  a  result  which  does  not  reflect  true  performance 
[Marks,  1961]. 

(4)  A  tendency  for  the  ratings  on  specific  dimen¬ 
sions  to  be  influenced  by  the  rater's  overall  impression  of 
the  student's  performance  -  the  "halo  effect"  [Smode,  et  al., 
1962;  Glaser  and  Klaus,  1966]. 
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(5)  An  unrelated  problem  with  the  simulator 
may  influence  the  evaluation  outcome  [Jones,  1976]. 

(6)  Evaluation  results  are  dependent  upon  the 
attitude,  concern,  and  values  of  the  instructor,  thus  a 
natural  personal  bias  is  introduced  into  the  performance 
observation  [Smode,  et  al.,  1962;  Knoop  and  Welde,  1973; 

Jones ,  197  6] . 

(7)  Instructors  have  different  concepts  of  the 
specific  grading  system  in  regard  to  the  flight  parameters 
involved,  knowledge  tested,  weights  to  be  assigned,  and  ranges 
of  qualifying  categories  [Knoop  and  Welde,  1973] . 

(8)  A  tendency  to  actually  rate  others  in  the 
opposite  direction  from  how  the  rater  perceives  himself  on 
the  particular  performance  dimension  has  been  found  [Smode, 
et  al . ,  1962 ]  . 

(9)  Ratings  tend  to  become  more  related  along 
different  dimensions  when  they  are  made  closer  to  each  other 
in  time  than  ratings  having  a  larger  interval  of  time  between 
observations  [Smode,  et  al. ,  1962] . 

(10)  Unless  the  simulator  has  a  playback  capabil¬ 
ity,  a  permanent  record  of  the  performance  is  lost  when  sub¬ 
jective  ratings  are  used  [Forrest,  1970;  Gerlach,  1975] . 

3 .  Objective  Performance  Measurement 

Objective  measurement  is  defined  as  observations  where 
the  observer  is  not  required  to  interpret  or  judge,  but  only 
to  record  his  observations  [Cureton,  1951] .  While  subjective 
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judgement  is  more  qualitative  in  nature,  objective  measurement 
demands  that  what  is  being  measured  be  observed  publicly  and 
with  a  quantitative  result  [Knoop  and  Welde,  1973;  Roscoe, 

1976;  Vreuls  and  Wooldridge,  1977;  McDowell,  1978].  Objective 
measures  demand  that  performance  be  evaluated  in  terms  of  cri¬ 
teria  which  are  relatively  independent  of  the  observer,  have 
consistent  interpretations,  and  a  high  degree  of  observer- 
observer  reliability  [Ericksen,  1952;  Danneskiold,  1955;  Marks, 
1961;  Smode,  et  al . ,  1962]. 

The  first  systematic  use  of  an  objective  grading  method 
was  by  Miller,  et  al.  [1947] .  Objective  measures  were  collected 
during  a  single  week  from  over  8,000  students  in  four  different 
phases  of  pilot  training.  Objective  observations  by  way  of  a 
prepared  checklist  reduced  variability  attributable  to  the 
observer  and  correlations  as  high  as  .88  were  found  between 
instructors  observing  a  student  during  the  same  flight.  In 
most  cases,  higher  observer-observer  reliability  has  been  found 
when  objective  measures  are  used  [Angell,  et  al.,  1964;  Forrest, 
1970] .  The  measures  are  free  from  personal  and  emotional  bias 
of  the  instructor,  as  well  as  judgemental  bias  that  are  char¬ 
acteristic  of  subjective  measurement. 

a.  Advantages  of  Objective  Measurement 

Most  advantages  of  objective  performance  measure¬ 
ment  appear  to  contrast  the  disadvantages  of  subjective  meas¬ 
urement.  By  having  a  computer  process  large  amounts  of 
continuously  varying  information,  the  instructor  is  freed  to 
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concentrate  on  those  aspects  of  student  performance  which 
resist  objective  measurement,  and  to  devote  more  attention 
to  the  primary  duty  of  instructing  [Krendel  and  Bloom,  1963; 
Knoop  and  Welde,  1973;  Vreuls  and  Obermayer,  1971,  Vreuls,  et 
al.,  1974].  Development  of  performance  criteria  could  be  made 
on  the  basis  of  permanently  recorded  objective  measures 
[Forrest,  1970;  Angell,  et  al.,  1964],  A  system  of  data  col¬ 
lection  would  provide  records  and  transcriptions  of  individual 
and  crew  performance  in  practice  missions  to  identify  particu¬ 
larly  effective  or  ineffective  behaviors  for  later  analysis 
in  the  event  of  an  aircraft  accident  [Forrest,  1970;  Angell, 
et  al . ,  1964 ] . 

Objective  measurement  enables  timely  and  diagnostic 
information  of  consistent  weaknesses  in  performance  [Angell, 
et  al. ,  1964;  Knoop  and  Welde,  1973].  Instructional  methods 
may  be  modified  as  performance  results  indicate  their  effec¬ 
tiveness,  or  lack  of  it.  Students  attaining  desired  achieve¬ 
ment  levels  may  also  be  identified  earlier  within  the  training 
syllabus.  Several  researchers  postulate  the  quantification 
of  skill  learning  rates  [Angell,  et  al.,  1964;  Knoop  and  Welde, 
1973].  Bowen,  et  al.  [1966],  even  found  objective  measures 
motivated  pilots  to  actualize  their  skills  in  overt  performance 
measurement.  They  speculated  that  the  "heightening  of  per¬ 
formance  is  due  to  intrinsic  motivation  (personal  desire  to 
achieve) ,  social  motivation  (pressures  from  group  to  demon¬ 
strate  proficiency) ,  and  a  focusing  of  attention  on  each 
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particular  performance  which  encourages  the  pilot  to  actualize 
the  knowledge  and  skills  which  he  possesses." 

b.  Disadvantages  of  Objective  Measurement 

If  objective  performance  measures  are  so  much  more 
desirable  than  subjective  ratings,  why  have  they  not  been  in¬ 
corporated  into  more  training  situations?  The  major  reason 
lies  in  the  fact  that  they  are  much  more  expensive  than  sub¬ 
jective  methods  and  some  aircrew  tasks  are  difficult  to  auto¬ 
matically  record  and  computer  grade.  Whenever  a  number  of 
simultaneously  occurring  tasks  such  as  communication,  proced¬ 
ures,  and  the  application  of  knowledge  are  present  during  a 
particular  task,  measuring  and  quantifying  the  operator  be¬ 
havior  involved  becomes  a  complex  and  inherently  difficult 
task  in  itself.  Smode  [1966]  found  that  when  flight  instruc¬ 
tors  are  required  to  monitor  and  record  performance  information 
during  a  flight  task,  some  resentment  against  the  objective 
measurement  method  occurred  due  to  the  large  amount  of  instruc¬ 
tor  attention  required.  This  instructor  monitoring  and  record¬ 
ing  of  student  performance  information  has  since  been  replaced 
by  automatic  digital  computers.  The  same  report  also  concluded 
that  detailed  analyses  of  aircrew  tasks  into  perceptual-motor 
tasks,  procedural  tasks,  and  decision-making  distorts  reality 
as  all  three  are  very  interrelated.  Objective  tests  are 
rigidly  constrained  by  given  hardware  that  has  to  be  physically 
arranged,  programmed,  and  is  subject  to  equipment  malfunctions. 
Since  objective  tests  require  more  structure  than  subjective 
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testing,  instructors  are  given  very  little  choice  in  what 
they  must  do  and  "there  is  a  certain  natural  resentment  against 
the  regimentation  of  setting  up  and  observing  this  event  at 
this  time  [Smode,  1966]." 

4 .  Summary 

Subjective  and  objective  performance  measurement  of 
aircrew  has  been  defined,  compared,  and  contrasted  for  strengths 
and  weaknesses.  Subjective  testing  is  universally  feasible, 
minimizes  paperwork,  allows  for  instructor  flexibility,  is  easy 
to  develop  and  administer,  and  is  inexpensive.  Objective  test¬ 
ing  minimizes  instructor  bias,  eases  grading  due  to  automatic 
data  collection,  storage  and  dissemination,  improves  perfor¬ 
mance  standardization,  and  produces  a  high  inter-rater  reli¬ 
ability.  Each  method  has  its  merits  individually,  but  when 
used  together  in  a  cohesive  and  synergistic  combination,  im¬ 
provements  can  be  made  in  aircrew  performance  measurement  and 
assessment . 

Angell,  et  al.  [1964]  stated,  "There  are  some  areas  in 
which  the  human  observer  can  make  more  subtle  judgements  and 
more  sophisticated  evaluations  than  can  any  electromechanical 
instruments.  .  .  the  human  observer/teacher  should  not  be  an 
adjunct,  but  rather  an  integral  part  of  the  total  measurement 
system."  This  report  will  attempt  to  use  this  observation  in 
the  design  of  a  system  to  measure  A— 6E  B/N  performance  during 
radar  navigation  in  the  simulator.  This  section  is  concluded 

by  a  listing  of  items  the  examiner  should  evaluate,  as  deter¬ 
mined  by  Knoop  and  Welde  [1973] : 
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(1)  Ability  to  plan  effectively. 

(2)  Decision-making  capability. 

(3)  Sensorimotor  coordination  and  smoothness  of  control. 

(4)  Ability  to  share  attentions  and  efforts  appropriately 
in  an  environment  of  simultaneous  activities. 

(5)  Knowledge  and  systematic  performance  of  tasks. 

(6)  Confidence  proportionate  to  the  individual's  level 
of  competence. 

(7)  Maturity;  willingness  to  accept  responsibility,  the 
ability  to  accomplish  stated  objectives,  judgements, 
and  reaction  to  stress,  unexpected  conditions,  and 
aircraft  emergencies. 

(8)  Motivation  (attitude)  in  terms  of  the  manner  in 
which  it  affects  performance. 

(9)  Crew  coordination. 

(10)  Fear  of  flying. 

(11)  Motion  sickness. 

(12)  Air  discipline  -  adherence  to  rules,  regulations, 
assigned  tasks,  and  command  authority. 

C.  A-6E  TRAM  AIRCRAFT  AND  ITS  MISSION 
1 .  A-6E  Performance  Specifications 

The  A-6E  aircraft  is  a  two-man,  subsonic,  twin  engine 
medium  attack  jet  aircraft,  with  side-by-side  seating  for  the 
pilot  and  B/N.  Designed  as  a  true  all-weather  attack  aircraft 
using  a  sophisticated  radar  navigation  and  attack  system,  the 
aircraft  can  accurately  deliver  a  wide  variety  of  weapons 
without  the  crew  ever  having  visually  acquired  the  ground  or 
the  target.  Capable  of  carrying  a  payload  of  up  to  8.5  tons, 
it  is  the  only  carrier-based  aircraft  in  the  Tactical  Air 
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(TACAIR)  wing  capable  of  penetrating  enemy  defenses  at  night, 
or  in  adverse  weather,  to  detect,  identify  and  attack  fixed 
or  moving  targets.  The  TRAM  (Target  Recognition,  Attack, 
Multisensor)  configured  A-6E  aircraft  has  a  completely  inte¬ 
grated  computer  navigation  and  control  system,  radar,  armament, 
flight  sensors,  and  cockpit  displays  that  enable  the  aircraft 
to  penetrate  enemy  defenses  at  distances  approaching  600 
nautical  miles  in  radius  while  at  an  extremely  low  altitude. 

2 .  Mission 

The  mission  of  the  A- 6  "Intruder"  is  to  perform  high 
and  low  altitude  all-weather  attacks  to  inflict  damage  on  the 
enemy  in  a  combat  situation.  TACAIR  recognizes  three  primary 
missions  to  accomplish  the  objective  of  successfully  waging 
war  [Gomer,  1979] .  The  missions  are:  Close  Air  Support  (CAS) , 
Counter  Air  (CA) ,  and  Air  Interdiction  (AI) .  CAS  is  air  action 
against  hostile  ground  targets  that  are  in  close  proximity  to 
friendly  ground  forces,  requiring  detailed  integration  of  each 
air  mission  with  the  battle  activities  and  movements  of  those 
forces.  CA  operations  involve  both  offensive  and  defensive 
air  actions  conducted  to  attain  or  maintain  a  desired  degree 
of  air  superiority  by  the  destruction  or  neutralization  of 
enemy  air  forces.  AI  missions  are  conducted  to  destroy,  neu¬ 
tralize,  or  delay  the  enemy's  military  potential  before  it 
can  be  brought  to  bear  against  friendly  forces,  usually  at 
far  distances  not  requiring  detailed  integration  of  air  and 
ground  activities.  The  AI  mission  was  selected  in  this  study 
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because  it  is  representative  of  those  missions  frequently 
performed  in  the  A-6  community.  Analysis  of  all  three  primary 
TACAIR  missions  was  beyond  the  scope  of  this  study. 

Saleh  [1980]  defined  a  mission  as  the  aggregate 
scenarios,  maneuvers,  and  segments  that  constitute  successful 
employment  of  the  system.  The  starting  point  in  determining 
criteria  for  performance  measurement  and  suggesting  what  spe¬ 
cific  and  clearly  identifiable  operations  of  the  B/N  should 
be  examined  in  greatest  detail  is  an  operational  definition 
of  the  man-machine  mission  [Smode,  1962;  Vreuls,  1974].  McCoy 
[1963]  further  stated  that  in  order  to  judge  the  effectiveness 
of  any  element  of  a  man-machine  system,  it  must  be  judged  in 
terms  of  contribution  of  the  element  to  the  final  system  out¬ 
put,  which  is  the  ultimate  objective  of  the  man-machine  system. 
It  is  with  these  criteria  in  mind  that  the  AI  mission  defini¬ 
tion  is  used  to  limit  performance  measurement  of  the  B/N  in 
the  A-6E  WST. 

3 .  Scenarios 

Analysis  for  comprehensive  performance  measurement 
begins  with  a  complete  decomposition  of  the  mission  into  smal¬ 
ler  parts  for  which  activities  and  performance  criteria  are 
more  easily  defined  [Vreuls,  1974;  Connelly,  1974;  Vreuls  and 
Cotton,  1980] .  Any  mission  may  be  described  in  terms  of  a 
scenario,  or  intended  flight  profile  or  regime.  A  performance 
measurement  standards  tri-service  project  [Vreuls  and  Cotton, 
1980] ,  classified  military  aviation  into  ten  scenarios  or 
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flight  regimes:  (1)  Transition/Familiarization,  (2)  Naviga¬ 

tion,  (3)  Formation,  (4)  Instruments,  (5)  Basic  lighter  Man¬ 
euvers,  (6)  Air  Combat  Maneuvering,  (7)  Air-to-Air  Intercept, 

(8)  Ground  Attack,  (9)  Air  Refueling,  and  (10)  Air  Drop. 
Scenarios  may  be  further  subdivided  into  maneuvers  by  identi¬ 
fying  natural  breakpoints  using  time,  position,  or  definitive 
portions  requiring  computation  of  different  performance  meas- 
m@s  or  changes  in  required  operator  skill  level.  Examples 
of  maneuvers  are  take-off,  climb,  landing,  and  point-to-point 
navigation.  Segments  are  subdivisions  of  maneuvers  that  con 
tain  groupings  of  those  activities  that  must  be  accomplished 
in  performing  the  maneuver.  Table  I  (adapted  from  Vreuls  and 
Cotton  [1980])  contains  possible  maneuvers  and  segments  for 
the  navigation  scenario. 

The  present  study  selected  point-to-point  navigation 
using  radar  terrain  mapping  to  further  narrow  the  scope  of  the 
effort  and  to  tailor  the  performance  measurement  aspects  of 
the  A-6E  mission  toward  the  tasks  of  the  B/N. 

4 .  Summary 

The  mission  of  the  A- 6  for  the  current  study  has  been 
defined  as  Air  Interdiction  which  is  further  subdivided  into 
point-to-point  navigation  using  radar  terrain  mapping.  Success¬ 
ful  accomplishment  of  the  radar  navigation  point-to-point  seg¬ 
ments  reasonably  infers  some  degree  of  overall  Air  Interdiction 
mission  accomplishment,  which  is  the  overall  objective  of  the 
A-6  man-machine  system.  Operationally  dividing  the  overall  A-6 
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mission  into  flight  maneuvers  enables  practical  performance 
measurement  as  the  operator  skills  required  (and  measured) 
vary  from  segment  to  segment.  Within  each  segment,  measure¬ 
ment  is  conceivably  possible  at  two  levels:  (1)  measurement 
of  the  total  man-machine  system  outputs  for  comparison  to 
expected  mission  goals,  and  (2)  measurement  of  human  operator 
activity  in  relation  to  system  outputs  [Vreuls,  1974]. 


TABLE  I:  NAVIGATION  MANEUVERS  AND  SEGMENTS 


SCENARIO 

MANEUVERS 

SEGMENTS 

Navigation 

Point-to-Point  Flight 

Dead  Reckoning 

Contact  (visual) 
Inertial 

Radar  Terrain  Mapping 

Nap-of-the-Earth 

Very  Low  Map  Inter¬ 
pretation 

Airways-Radio 

VOR 

TACAN 

ADF' 

Off  Airways-Radio 

Area  Navigation 

Over  Water 

LORAN 

Celestial 

Global  Positioning 
System 

Source:  Vreuls  and  Cotton  [1980] 
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II.  STATEMENT  OF  PROBLEM 


The  need  for  aircrew  performance  measurement  and  assessment 
has  long  been  recognized  across  all  aviation  communities.  Per¬ 
formance  measurement  produces  information  needed  for  a  specific 
purpose,  such  as  the  evaluation  of  student  performance  or  the 
identification  of  aircrews  needing  training.  Unfortunately, 
the  assessment  of  aircrew  proficiency  in  those  skills  associ¬ 
ated  with  advanced  flying  training  still  depends  largely  on 
subjective  evaluations  by  qualified  instructor  pilots  (IPs) 
and  instructor  B/Ns  (IB/Ns) ,  supplemented  with  analytically- 
derived  somewhat  objective  mission  performance  metrics,  e.g., 
bombing  scores  [Obermayer,  et  al. ,  1974].  An  economically 
acceptable  means  of  objectively  measuring  behavioral  skills 
in  the  operational  or  crew  training  environment  has  continued 
to  be  a  critical  problem  in  the  FRS,  due  mainly  to  a  "nice  to 
have"  and  nonessential  outlook  towards  any  performance  measure¬ 
ment  scheme  other  than  the  traditional  "always  done  this  way" 
method  of  subjective  ratings.  A  Department  of  Defense  review 
of  tactical  jet  operational  training  in  1968  commented:  "The 
key  issue  underlying  effective  pilot  training  is  the  capability 
for  scoring  and  assessing  performance  ...  in  essence,  the 
effectiveness  of  training  is  dependent  upon  how  well  perfor¬ 
mance  is  measured  and  interpreted  [Office  of  Secretary  of 
Defense,  1968 ] . " 
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Despite  the  key  issue  of  scoring  and  assessing  performance 
being  identified,  current  aircrew  performance  measurement  by 
IPs  and  IB/Ns  during  simulated  missions  in  the  A-6E  WST  is  all 
subjective  in  nature,  although  the  2F114  simulator  has  a 
current  objective  performance  measurement  capability.  More 
details  on  current  aircrew  performance  measurement  by  instruc¬ 
tors  in  the  A-6E  WST  will  be  given  in  Chapter  VI. 

A.  PROBLEM  STATEMENT 

Current  student  performance  measurement  and  assessment  in 
the  A-6E  WST  by  an  instructor  is  entirely  subjective  in  nature. 
The  A-6E  WST  has  the  capability  to  objectively  measure  student 
performance,  but  is  not  being  utilized  in  this  fashion.  Meas¬ 
uring  performance  is  the  key  to  training  effectiveness. 

Objective  measurement  for  aviation  training  programs  has  been 
prescribed  by  higher  authority.  Effective  performance  measure¬ 
ment  by  using  objective  methods  is  vital  to  establishing  per¬ 
formance  criteria,  the  effective  utilization  of  the  simulator, 
instructor  effectiveness,  aircrew  skill  identification  and 
definition,  and  the  Instructional  Systems  Development  (ISD) 
systems  approach  to  training.  The  problem  that  must  be  ad¬ 
dressed  is  designing  a  performance  measurement  and  evaluation 
system  for  the  B/N  during  radar  navigation  that  will  incorporate 
the  characteristics  of  objective  performance  measurement  and 
still  retain  the  judgement  and  experience  of  the  instructor 
as  a  valuable  measuring  tool.  This  thesis  will  focus  upon  , 
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performance  measurement  as  a  system  with  definable  components 
that  interact  and  produce  information  necessary  for  the  suc¬ 
cessful  identification  of  the  skill  level  of  the  student  in 
regards  to  navigating  the  A-6E  aircraft.  As  a  result,  the  A-6E 
WST  can  be  utilized  more  effectively,  instructors  can  become 
more  effective  in  teaching  students  critical  combat  skills, 
and  students  can  complete  FRS  training  being  identified  at  a 
minimum  skill  level  and  "mission  ready"  for  full-system  radar 
navigation  in  the  A-6E  aircraft. 

B.  THE  IMPORTANCE  OF  OBJECTIVE  PERFORMANCE  MEASUREMENT 

A  number  of  factors  have  contributed  to  the  emerging  role 
of  objective  aircrew  performance  measurement  in  both  actual 
flight  and  simulators  of  military  aviation  units.  Generally, 
this  role  has  developed  through  an  increased  awareness  of  the 
advantages  associated  with  objective  measurement,  and  the 
several  basic  disadvantages  of  the  subjective  evaluation 
method,  as  outlined  in  Chapter  I. 

The  remainder  of  this  section  will  outline  both  potential 
and  actual  necessities  for  objective  performance  measurement 
of  aircrew  in  the  training  environment.  Beginning  with  a 
study  of  Department  of  Defense  policy  toward  aircrew  perfor¬ 
mance  and  evaluation  methods,  the  benefits  of  objective  per¬ 
formance  measurement  are  discussed  in  regards  to:  standards 
establishment,  increased  simulator,  instructor,  and  training 
effectiveness,  aircrew  skill  level  identification  and  defini¬ 
tion,  and  lastly,  ISD  requirements. 


39 


1.  Policy  Guidance 

Several  studies  offer  guidance  with  respect  to  the 
issue  of  using  more  objective  measurement  techniques  in  air¬ 
crew  training.  This  guidance  supports  the  development  and 
utilization  of  objective  performance  measurement  as  an  adjunct 
or  complement  to  current  subjective  ratings.  In  1968,  the 
Department  of  Defense  review  previously  cited  found  that 
"subjective  evaluation  was  the  technique  in  general  use  in 
training  programs  observed"  and  had  been  since  before  World 
War  II  [Office  of  Secretary  of  Defense,  1968].  The  study  went 
on  to  comment,  "Judgement  and  experience  can  be  helped  by 
quantitative  analytical  methods"  and  that  the  application  of 
such  methods  serves  three  purposes: 

(1)  They  make  it  necessary  to  identify  the  standards  of 
performance  desired  for  each  of  the  many  events  the 
pilot  must  learn. 

(2)  They  determine  how  many  practices  or  trials  a 
student  must  accomplish,  on  the  average,  to  meet 
the  desired  standard. 

(3)  They  tell  the  manager  how  much  improvement  he 
normally  may  anticipate  with  each  additional 
practice  or  trial. 

This  study  concluded:  "The  services  should  apply  objective 
evaluation  techniques  where  currently  feasible  in  parts  of 
existing  training  programs  ..."  and  "where  valid  performance 
data  in -aircrew  training  programs  can  be  recorded  and  stored, 
quantitative  analytical  methods  should  be  used  to  assist  the 
commander  in  making  decisions  concerning  revising  and  adjust¬ 
ing  the  course." 
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A  study  by  the  Comptroller  General  of  the  United 
States  (General  Accounting  Office)  in  1973  to  the  Congress 
on  the  use  of  flight  simulators  in  military  pilot  training 
programs  stated,  "Simulators  could  also  be  used  to  more  accur¬ 
ately  measure  pilot  proficiency  by  using  systematic  grading 
procedures."  A  lack  of  standardized  grading  instructions 
which  did  not  show  performance  tolerances  for  the  Navy  was 
noted.  Conclusions  reached  were: 

Objective  grading  of  pilot  proficiency  using  simulators 
would  provide  more  consistent  and  accurate  results  for 
many  phases  of  flight  training  and  eliminate  the  possi¬ 
bility  of  human  bias  and  error  associated  with  the 
current  evaluation  method  .  .  .  simulator  grading 
accurately  evaluates  pilot  proficiency  for  certain 
flight  maneuvers. 

2 .  Establishment  of  Standards 

The  performance  criteria,  or  standard,  is  a  statement 
or  measure  of  performance  level  that  the  individual  or  group 
must  achieve  for  success  in  a  system  function  or  task  (Office 
of  Secretary  of  Defense,  1968] .  When  performance  standards 
are  established  on  the  basis  of  subjective  experience  and 
expertise,  the  result  in  most  cases  will  be  inadequate.  When 
standards  are  set  too  low,  some  risk  is  incurred  with  degraded 
system  effectiveness.  When  set  too  high,  costly  overtraining 
is  the  result  [Riis,  1966;  Office  of  Secretary  of  Defense, 
1968;  Campbell,  et  al.,  1976;  Deberg,  1977;  Rankin  and 
McDaniel,  1980].  The  establishment  of  a  standard  or  baseline 
of  performance  is  an  important  result  of  objective  performance 
measurement.  The  Department  of  Defense  review  in  1968  stated. 
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"Reliable  measures  of  pilot  performance  against  validated 
standards  is  the  keystone  for  determining  how  much  instruction 
and  practice  is  required  to  attain  desired  levels  of  skills 
[Office  of  Secretary  of  Defense,  1968]."  Even  though  the 
importance  of  performance  standards  is  recognized,  some  con¬ 
cern  by  the  A-6  FRS  instructors  has  occurred  about  establish¬ 
ing  operational  standards  for  aircrew  performance,  due  to 
possible  misuse,  incorrect  adaptation  in  the  training  program, 
or  insufficient  assessment  before  implementation  [Campbell, 
et  al. ,  1976].  This  issue  will  be  addressed  in  Chapter  VII. 

3 .  Effective  Use  of  Simulators 

Objective  performance  measurement  increases  the  effec¬ 
tive  use  of  simulators.  When  performance  measures  and  criteria 
are  defined,  inputted,  and  monitored  by  an  automatic  system 
requiring  little  instructor  intervention,  other  training  and 
teaching  functions  of  the  simulator  may  be  used  by  the  instruc¬ 
tor;  thus  an  increase  in  the  effective  use  of  simulators  occurs 
[Danneskiold,  1955;  Knoop,  1968] . 

4 .  Effectiveness  of  Instructors 

The  major  impact  of  an  effective  measurement  method  on 
the  instructor  during  a  simulator  mission  would  be  to  free  him 
from  monitoring  dials,  Cathode  Ray  Tubes  (CRTs),  and  lights 
for  aircrew  performance  measurement  and  evaluation.  Due  to 
the  complexity  of  the  A-6E  WST,  the  simulator  instructor  is 
humanly  unable  to  monitor  and  interpret  in  real-time  all  per¬ 
tinent  performance  information  during  a  training  mission. 
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Objective  measurement  techniques  would  relieve  the  instructor 
of  these  monitoring  duties,  enabling  more  time  for  teaching 
and  evaluating  those  aspects  of  human  performance  that  are 
inaccessible  by  objective  methods  [Smode  and  Meyer,  1966; 

Knoop,  1968;  Vreuls,  et  al.,  1974;  Kemmerling,  1975;  Carter, 
1977;  Charles,  1978,  Semple,  et  al.,  1979].  When  performance 
standards  are  established  by  objective  methods,  instructor 
judgements  can  be  made  more  reliable  and  valid  by  confining 
the  instructor's  judgement  to  evaluating  performance  without 
the  additional  burden  of  establishing  and  adjusting  personal 
standards  [Office  of  Secretary  of  Defense,  1968]  .  Efficiency 
of  instructor  utilization  may  be  achieved  by  allowing  instruc¬ 
tors  more  flexibility  in  identifying  and  assisting  students 
who  are  found  to  be  deficient  from  objective  performance 
measurement  feedback  of  criterion  levels  reached  [Carter,  1977; 
Deberg,  1977;  Kelly,  et  al.,  1979].  Such  objective  information 
might  also  provide  instructors  diagnostic  information  about 
their  own  performance  as  a  teacher  after  seeing  patterns  of 
strengths  and  weaknesses  in  their  students  [Kelly,  et  al., 

1979]  . 

5 .  Effectiveness  of  Training 

Many  factors  influence  simulator  training  effective¬ 
ness,  including:  simulator  design,  the  training  program, 
students,  instructors,  and  the  attitude  of  personnel  towards 
the  simulator  [Tindle,  1979].  Smode  and  Meyer  [1966],  in  a 
review  of  Air  Force  pilot  training,  concluded:  "The  development 
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of  objective  scores  to  be  used  in  simulator  training  would 
represent  a  major  step  toward  improving  the  effectiveness  of 
pilot  training  programs."  Other  notable  results  of  objective 
measurement  can  also  contribute  to  increased  training  effec¬ 
tiveness.  The  real-time  feedback  of  performance  measurement 
to  the  student  is  essential  to  the  fundamental  concept  of 
knowledge  of  results,  a  prerequisite  to  any  learning  process. 
Quantitative  feedback,  in  turn,  allows  the  student  and  instruc¬ 
tor  to  determine  the  student's  individual  strengths  and  weak¬ 
nesses  in  performing  the  mission,  which  may  then  be  concentrated 
on  by  the  instructor  for  remedial  training  [Smode  and  Meyer, 
1966;  Obermayer,  et  al.,  1974,  Deberg,  1977;  Carter,  1977; 
Pierce,  et  al.,  1979;  Kelly,  et  al.,  1979].  Modifications  in 
training  methods,  course  content,  and  sequence  of  course 
material  could  be  more  accurately  assessed  by  the  FRS  training 
officer  [Vreuls  and  Obermayer,  1971;  Pierce,  et  al . ,  1979; 

Kelly,  et  al.,  1979].  Student  progress  within  a  training 
program  can  be  more  accurately  monitored,  culminating  with 
the  introduction  to  the  fleet  of  an  "operationally  capable" 
or  "mission  ready"  aircrew  member  at  minimum  cost  [Riis,  1966; 
Campbell,  et  al.,  1976;  Pierce,  et  al.,  1979]. 

6 .  Skill  Identification  and  Definition 

The  employment  of  objective  measures  in  simulator 
training  will  enable  the  identification  and  definition  of 
critical  combat  skills  of  mission  ready  aircrews.  The  precise 
definitions  of  "current"  and  "proficient"  and  the  quantitative 
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measurement  of  these  concepts  continues  to  be  a  major  problem 
in  both  training  and  fleet  environments  today  [McMinn,  1981] . 
Objective  performance  measurement  requires  the  definition  of 
"proficient"  as  a  prerequisite  to  quantification  of  perfor¬ 
mance  [Pierce,  et  al.,  1979]. 

7 .  Instructional  Systems  Development 

Instructional  Systems  Development  is  currently  being  . 
applied  to  military  flight  training  systems.  The  approach 
requires  extensive  analysis  of  the  specific  training  to  be 
accomplished,  the  behavioral  objective  for  each  task  to  be 
trained,  and  the  level  of  proficiency  required  [Vreuls  and 
Obermayer,  1971] .  In  support  of  ISD,  measures  and  a  measure¬ 
ment  system  are  necessary  to:  (1)  perform  analyses  of  systems 
in  their  operational  environments,  (2)  establish  quantitative 
instructional  standards,  (3)  provide  an  index  of  achievement 
for  each  behavioral  objective,  and  (4)  evaluate  alternative 
instructional  content,  approaches,  and  training  devices  [Vreuls 
and  Obermayer,  1971;  Obermayer,  et  al.,  1974;  Deberg,  1977; 
Prophet,  1978;  Kelly,  et  al.,  1979]. 

When  a  state-of-the-art  flight  simulator  is  available 
to  an  ISD  team,  it  should  be  the  basic  medium  around  which 
the  course  is  organized  [Prophet,  1978].  Campbell,  et  al. 

[1976]  applied  the  ISD  process  to  the  design  of  an  A-6E  air¬ 
crew  training  program  and  used  the  A-6E  WST  for  a  large  part 
of  student  training.  The  study  concluded  that  "difficulty 
was  experienced  in  applying  the  ISD  process  to  the  development 
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of  Specific  Behavioral  Objectives  (SBOs)  and  criteria  test 
statements,  due  to  the  lack  of  documented  quantitative  standards 
of  performance."  In  a  review  of  U.S.  Navy  fleet  aviation  train¬ 
ing  program  development,  Prophet  [1978]  reviewed  the  A-6E  ISD 
application  as  well  as  three  other  major  ISD  efforts  for  various 
aircraft.  The  results  of  that  study  concluded  that  one  "... 
major  shortcoming  was  in  the  area  of  performance  measurement 
and  evaluation,"  and  recommended  measurement  as  a  possible 
future  area  for  improvement  to  the  ISD  model. 

The  need  for  incorporating  objective  performance 
measurement  methods  has  been  addressed.  The  methodology  for 
the  introduction  of  objective  performance  measurement  into 
the  A-6E  WST  for  B/N  performance  will  now  be  discussed. 
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III.  METHODOLOGY 


A.  PROCEDURE 

The  methodology  used  in  formulating  a  model  to  measure 
B/N  performance  during  radar  navigation  in  the  A-6E  WST  was 
based  on  an  extensive  literature  review  and  an  analytical  task 
analysis  of  the  B/Ns 1  duties.  Figure  1  illustrates  the  ap¬ 
proach  taken  in  this  report.  After  selection  of  the  mission, 
scenario,  and  segment  of  interest,  the  review  concentrated  on 
aircrew  performance  measurement  research,  which  emphasized 
navigation,  training,  and  skill  acquisition.  A  model  was  then 
formulated  to  show  the  relationship  among  student  skill  acqui¬ 
sition,  performance  evaluation,  and  the  radar  navigation  task. 
This  hybrid  model,  discussed  in  Chapter  V,  was  improvised  by 
the  author  specifically  to  illustrate  difficult  concepts  of 
aircrew  performance  measurement  and  evaluation.  The  literature 
review  identified  different  approaches  taken  in  using  perfor¬ 
mance  measurement  from  a  systems  point  of  view,  some  of  which 
were  integrated  and  applied  to  the  current  situation. 

An  in-depth  task  analysis  of  the  B/N  was  performed  with 
the  purpose  of  generating  candidate  performance  measures  for 
operator  behavior.  Skills  and  knowledge  required  to  perform 
the  radar  navigation  maneuver  were  identified  and  a  mission 
time  line  analysis  was  conducted  to  identify  tasks  critical 
to  performance.  A  model  was  formulated  of  the  A— 6E  crew— system 
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Figure  1.  Methodology  Flow  Diagram. 
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interactions  to  illustrate  the  complexity  involved  in  measur¬ 
ing  B/N  performance.  After  defining  the  purpose  of  measuring 
B/N  performance,  candidate  performance  measures  were  identified 
for  possible  use  in  measuring  B/N  performance. 

Candidate  performance  measures  from  the  literature  and  the 
task  analysis  were  compared  and  measures  were  selected  that 
met  the  criteria  of  face  validity,  ease  of  use,  instructor  and 
student  acceptance,  and  appropriateness  to  the  training  envi¬ 
ronment.  These  candidate  measures  were  then  compared  to  cur¬ 
rent  B/N  student  performance  measurement  and  generic  performance 
measurement  systems.  The  result  was  a  performance  measurement 
system  for  the  B/N  during  radar  navigation  in  the  A-6E  WST. 
Evaluation  models  were  then  investigated;  a  sequential  sampling 
decision  model  was  selected  for  B/N  performance  evaluation. 

B.  ASSUMPTIONS 

This  section  will  present  some  underlying  assumptions  that 
are  necessary  for  performance  model  development  and  implemen¬ 
tation,  beginning  with  a  discussion  on  the  necessity  of  the 
A-6E  WST  to  realistically  duplicate  the  A-6E  CAINS  aircraft 
in  both  engineering  and  mission  aspects.  The  unique  role  of 
the  pilot  during  the  radar  navigation  mission  in  the  A-6E  WST 
is  discussed  with  respect  to  his  contribution  to  measuring  the 
B/N ' s  performance.  A  discussion  of  the  literature  review  in 
respect  to  the  relationship  between  results  from  pilot  studies 
and  navigator  performance  is  presented,  followed  by  a  discus¬ 
sion  of  the  need  for  the  existence  of  a  mathematical 
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relationship  between  measurement  and  operator  behavior. 

Finally,  an  assumption  is  stated  concerning  the  relationship 
between  motivated  and  experienced  aircrew  and  high  skill 
levels.  These  discussions  follow. 

1 .  Simulator  Fidelity 

It  is  assumed  that  the  A-6E  WST  represents  to  a  satis¬ 
factory  degree  those  elements  of  the  A-6E  CAINS  aircraft  such 
that  the  A-6E  WST  aircrew  is  confronted  with  a  realistic 
"duplication”  of  the  operational  situation  and  that  the  air¬ 
crew  should  be  required  to  perform  as  they  would  in  actual 
aircraft  flight.  Given  this  assumption,  training  and  per¬ 
formance  evaluation  can  be  effectively  achieved  in  the  simu¬ 
lator  for  most  B/N  activities. 

2 .  Pilot  and  B/N  Relationship 

The  A-6E  effectiveness  in  terms  of  crew-system  output 
is  a  function  of  pilot  and  B/N  crew  coordination.  Because  of 
the  major  role  of  the  B/N's  activities  in  achieving  the  desired 
mission  success  during  A-6E  radar  navigation,  it  is  assumed 
that  any  variability  within  the  A-6E  system  that  can  be  meas¬ 
ured  and  attributed  to  the  pilot  will  be  small.  In  effect, 
the  pilot's  function  within  this  mission,  scenario  and  segment 
will  be  to  "fly  system  steering,"  which,  for  the  most  part,  is 
the  result  of  the  B/N's  performance  as  a  navigator  and  systems 
operator. 
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3.  Literature  Review  Results 


Most  of  the  literature  in  the  area  of  aircrew  per¬ 
formance  measurement  has  for  the  most  part  concentrated  on 
the  pilot  for  performance  measurement  and  evaluation.  For 
similar  missions,  scenarios  and  aircrew  tasks,  it  is  assumed 
that  what  was  a  significant  result  in  terms  of  performance 
measurement  for  a  pilot  will  be  much  the  same  result  as  that 
for  a  navigator.  This  assumption  does  not  include  the  psycho¬ 
motor  domain  of  human  operator  performance  entirely,  but  does 
draw  some  parallels  from  pilot  research  results  to  the  naviga¬ 
tor.  Although  each  position  within  the  aircraft  is  somewhat 
different,  many  similarities  are  assumed  to  exist  in  terms  of 
operator  output,  man-machine  system  output,  and  measures  of 
effectiveness. 

4 .  Mathematical  Description  of  Behavior 

It  is  assumed  that  a  mathematical  relationship  exists 
between  some  aspects  of  operator  behavior  and  performance 
measurement  and  evaluation.  Most  likely,  for  the  multi¬ 
dimensional  aspects  of  behavior,  a  multi-descriptive  mathe¬ 
matical  result  would  best  describe  that  behavior  in  valid  and 
reliable  terms.  Objective  performance  measurement  relies  for 
the  most  part  on  numerical  and  statistical  analysis  of  opera¬ 
tor  and  system  outputs.  Thus,  this  assumption  is  necessary 
for  the  utilization  of  objective  performance  measurement  to 
measure  and  evaluate  the  B/N's  control  movements. 
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5 .  Experience  and  Skilled  Behavior 

When  properly  motivated  and  presented  with  a  realistic 
simulated  flight  mission  with  the  representative  flight  tasks, 
highly  experienced  ("fleet  qualified")  aircrews  are  assumed 
to  exhibit  skilled  behavior  of  an  advanced  stage  or  high  level 
that  is  characterized  by  minimum  effort  and  consistent  re¬ 
sponses  ordinarily  found  in  actual  aircraft  flight  for  the 
same  mission.  The  demonstrated  performance  of  highly  skilled 
aircrew,  under  this  assumption,  allows  for  the  establishment 
of  performance  standards  from  which  comparisons  can  be  made 
to  populations  of  aircrew  that  are  less  than  highly  skilled. 
Both  the  problem  of  motivated  behavior  and  establishment  of 
performance  standards  will  be  discussed  in  Chapter  VII. 
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IV.  THE  FUNDAiMENTAL  NATURE  OF  PERFORMANCE  EVALUATION 


This  section  is  not  intended  to  be  a  definitive  expo¬ 
sition  on  performance  measurement  and  evaluation  theory. 
However,  certain  basic  concepts  of  performance  measurement 
and  evaluation  need  to  be  defined  and  explained  so  that  a 
common  understanding  of  subsequent  chapters  will  occur  with 
minimum  confusion.  This  material  is  approached  with  a  log¬ 
ical  time-dependency  sequence,  beginning  with  measurement 
theory,  and  ending  with  some  desirable  characteristics  of  a 
total  performance  measurement  and  assessment  system  in  the 
training  environment. 

Four  major  areas  of  performance  evaluation  will  be  dis¬ 
cussed  in  this  section:  measurement  considerations,  criteria 
considerations,  performance  measurement  considerations,  and 
performance  evaluation  considerations.  The  main  purpose  is 
to  show  that  measurement  and  criteria  are  needed  before  the 
evaluation  process  begins.  Measurement  considerations  include 
the  definition  and  purpose  of  measurement,  types  of  measures, 
levels  of  measurement,  transformations,  measurement  accuracy, 
reliability  and  validity  of  measurement,  and  the  selection 
of  initial  measures  for  man-machine  performance.  The  area  of 
criteria  considerations  addresses  the  definition  and  purpose 
of  criteria,  types  of  criteria,  characteristics  of  criteria, 
establishing  criteria,  sources  of  criterion  error,  measures 
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of  effectiveness,  and  selection  of  criteria.  Performance 
measurement  considerations  include  other  aspects  of  perfor¬ 
mance  measurement  such  as:  subjective  versus  objective 
measures,  combining  measures,  overall  versus  diagnostic 
measures,  individual  versus  crew  performance,  and  training 
measures.  The  last  area  of  this  section,  performance  evalu¬ 
ation  considerations,  shows  how  evaluation  depends  upon  meas¬ 
urement  and  criteria,  and  discusses  the  definition  and  purpose 
of  performance  evaluation,  types  of  evaluation,  accuracy  of 
evaluation,  evaluating  individual  and  group  differences,  and 
the  characteristics  of  evaluation. 

The  reader  already  familiar  with  the  above  material  may 
wish  to  skip  ahead  to  the  next  section.  Others  not  familiar 
will  need  the  theory  to  aid  understanding  of  subsequent 
chapters . 

A.  MEASUREMENT  CONSIDERATIONS 

1 .  Definition  and  Purpose  of  Measurement 

Measurement  is  information  about  performance  for  a 
specific  purpose,  such  as  whether  or  not  a  student  is  "mission 
ready"  to  navigate  a  particular  aircraft  [Vreuls  and  Cotton, 
1980].  Unfortunately,  this  definition  leaves  open  the  serious 
question  of  quantification;  just  what  and  how  do  you  measure 
and  then  transform  the  raw  data  into  useful  information?  In 
elemental  measurement  theory,  measurement  involves  the  assign¬ 
ment  of  a  class  of  numerals  to  a  class  of  objects,  where  the 
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class  of  objects  becomes  human  behavior,  the  class  of  numerals 
must  be  defined,  and  some  sort  of  rules  for  assigning  the 
numerals  to  the  objects  must  exist  [Lorge,  1951;  Forrest,  1970]. 
Measurement  then  becomes  an  abstract  concept  of  "mapping"  a 
class  or  set  of  numerals  to  a  class  or  set  of  human  behaviors 
or  performance,  but  this  concept  then  becomes  more  quantifi¬ 
able  in  nature.  All  measurements  are  estimates  of  the  true 
value  or  actual  amount  of  the  human  behavior  possessed  at  a 
given  point  in  time  [Smode,  et  al.,  1962].  The  difficulty  in 
the  measurement  of  human  behavior  increases  when  the  important 
aspects  of  the  behavior  being  measured  are  more  qualitative 
than  quantitative  in  nature  [Glaser  and  Klaus,  1966]  .  Meas¬ 
urement  requires  the  action  of  observation,  where  behavior  is 
noticed  or  perceived  and  recorded.  Glaser  and  Klaus  [1966] 
and  Lorge  [1951]  noted  that  some  observations  can  be  made 
directly,  involving  perceptions  of  the  behavior  or  of  the 
behavior's  properties,  where  other  observations  can  only  be 
estimated  from  inferences  about  the  behavior,  or  its  proper¬ 
ties  from  its  effects  on  other  system  components.  The  steps 
to  measurement,  as  outlined  by  Forrest  [1970],  include: 

(1)  Determine  the  specific  object  or  aspect  to  be 
measured. 

(2)  Locate  or  expose  the  particular  object  or  aspect 
to  view. 

(3)  Apply  a  measurement  scale. 

(4)  Express  the  results  as  a  dimension. 
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Measurement  must  precede  the  activity  of  performance 
evaluation,  which  is  the  process  of  interpreting  the  results 
of  measurement  and  comparing  them  to  an  established  standard. 
Measurement  in  the  training  context  serves  a  variety  of  func¬ 
tions  which  emphasize  either  achievement  (present  knowledge 
or  skill  level)  or  prediction  (expected  performance  under 
specified  conditions) .  Several  specific  purposes  of  training 
performance  measurement  as  outlined  by  Smode,  et  al.  [1962] , 
Buckhout  and  Cotterman  [1963] ,  Glaser  and  Klaus  [1966] ,  Vreuls 
and  Obermayer  [1971],  Farrell  [1974],  and  Vreuls,  et  al.  [1975] 
are  enumerated  below: 

(1)  Prediction  of  future  performance  of  a  student  for  a 
specified  future  operational  setting. 

(2)  Present  performance  evaluation  of  knowledge,  skill 
level,  or  performance  level  of  a  student. 

(3)  Learning  rate  evaluation  at  several  points  in  a 
training  program  to  provide  a  basis  for  judging  a 
student's  present  stage  of  learning  for  subsequent 
advancement  to  the  next  training  phase. 

(4)  Diagnostic  identification  of  strengths  and  weak¬ 
nesses  of  a  student  so  that  additional  training 
may  occur. 

(5)  Training  effectiveness  resulting  from  the  nature 
and  extent  of  training  syllabus  or  course  material 
changes . 

(6)  Criterion  information  necessary  for  defining  what 
constitutes  successful  or  proficient  performance. 

(7)  Functional  requirements  for  future  training 
equipment. 

(8)  Selection  and  placement  of  the  student  with  an 
achieved  level  of  proficiency  to  a  position  or 
mission  with  a  required  minimum  proficiency  level. 
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2 .  Types  of  Measures 


Measurement  is  a  process  of  producing  raw  data  in  the 
form  of  measures  (parameters  or  variables)  as  a  result  of 
observing  human  behavior  in  a  man-machine  system.  Measures 
are  quantities  which  can  take  on  any  of  the  numbers  of  some 
set  and  which  usually  vary  along  a  defined  continuum,  or  scale 
[Knoop,  1968] .  The  classification  of  measures  is  commonly 
done  by  using  the  characteristics  of  the  measures  themselves. 
Using  taxonomies  developed  by  Smode,  et  al.  [1962] ,  Angell, 
et  al.  [1964] ,  and  Vreuls  and  Obermayer  [1971] ,  measures  may 
be  grouped  into  several  major  classes  with  a  collection  of 
like  measures  within  each  class.  The  major  classes  are  listed 
and  briefly  defined  below: 

(1)  Time  periods  in  output  or  performance. 

(2)  Accuracy  or  correctness  of  output  or  performance. 

(3)  Frequency  of  occurrence  or  the  rate  of  repetition 
of  behavior. 

(4)  Amount  achieved  or  accomplished  in  output  or 
performance. 

(5)  Quantity  used  or  resources  expended  in  performance 
in  terms  of  standard  references. 

(6)  Behavior  categorization  by  observers  (subjective 
measurement) . 

(7)  Condition  or  state  of  the  individual  in  relation 
to  the  task  which  describes  the  behaviors  and 
results  of  that  behavior  on  system  output. 

This  classification  produced  approximately  83  measures  within 

the  seven  major  classes  and  are  listed  completely  in  Smode, 

et  al.  [1962] .  A  more  recent  taxonomy  by  Mixon  and  Moroney 
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[1981]  grouped  objective  only  aircrew  performance  measures 
by  the  following  six  major  classes: 

(1)  Physiological  outputs  from  the  operator. 

(2)  Aircraft  systems,  instruments  or  equipment. 

(3)  Man-machine  system  output  within  the  operating 
environment . 

(4)  Time  periods  in  output  or  performance. 

(5)  Frequency  of  occurrence  or  the  rate  of  repetition 
of  behavior. 

(6)  Combined  overall  measures. 

These  measures  were  obtained  from  an  extensive  literature 
review  of  aircrew  performance  measurement  spanning  the  years 
1962-1980.  Table  II  lists  over  180  measures  within  each  major 
class.  It  is  interesting  to  note  that  all  of  these  measures 
were  obtained  from  actual  aircrew  performance  field  or  labor¬ 
atory  results,  in  contrast  to  previous  listings  of  proposed 
or  candidate  measures.  Some  measures  listed  in  Table  II  will 
be  used  as  candidate  measures  for  B/N  performance  during  radar 
navigation  in  the  WST  (see  Table  XI) . 

a.  Distributions  of  Measures 

The  process  of  measuring  continuous  and  discrete 
human  behavior  results  in  a  sample  of  measures  that  are  esti¬ 
mators  of  the  actual  operator  behaviors  in  the  system  being 
examined.  The  usefulness  of  the  measures  becomes  apparent 
when  they  are  used  to  describe  the  behavior  as  "good"  or  "bad" 
when  compared  to  a  reference  or  standard  measure  (criterion) . 
Statistical  techniques  are  used  to  transform  raw  measures  into 
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TABLE  II:  A  TAXONOMY  OF  MEASURES 


PHYSIOLOGICAL  OUTPUTS 

Biochemical  analysis  (blood) 
Cardiovascular 
Communicat ions/Speech 
Electroencephalogram  (EEG) 
Electromyogram  (EMG) 

Eye  movements 
Finger  tremor 

Galvanic  Skin  Response  (GSR) 

Metabolic  rate 

Perspiration  weight  loss 

Pupillography 

Respiration 

Temperature 

Time  of  sleep 

Urinary  catecholomines 

Visual  Evoked  Potential  (VEP) 

AIRCRAFT  SYSTEMS 

ADI  displacement 
Aileron 

Aircraft  gross  weight 
Angle  of  attack 

Approach  glideslope  display  error 

Approach  localizer  display  error 

Automatic  Direction  Finder  (ADF) 

Autopilot  vertical  tracking  error 

Ball  angle 

CDI  error 

Collective 

Control  stick 

Cyclic 

DME 

Elevator 

Engine  Pressure  Ratio  (EPR) 

Engine  RPM 
Flaps 

Flight  director  error 

Fuel  flow 

Heading 

_ Inclinometer _ 

Source:  Mixon  and  Moroney  [1981] . 


59 


TABLE  IX  (Continued) 


AIRCRAFT  SYSTEMS  Landing  gear 

(Cont'd)  Pedal  (helicopter) 

Radar  altimeter  error 

Rotor  RPM 

Rudder 

Speed  brake 

Tail  rotor  position 

Throttle 

Thrust  attenuator 
Thumbwheel  inputs 
Trim 

MAN-MACHINE  SYSTEM  OUTPUT 

Acceleration 

ACM  plane  of  action  {X,Y,Z) 
Aircraft/boom  oscillations 
Airspeed 
Altitude 

Altitude  (pitchout) 

Altitude  (radar) 

Approach  angle  error 
Approach  centerline  error 
Approach  glideslope  error 
Approach  range 
Checklist  errors 
Circular  bomb  error 
Closing  speed 
Course  error 
Crosstrack  error 

Deviations  from  ideal  flight  path 
Dip  to  target  error  (ASW) 

Distance  traveled 

Dive  angle  at  bomb  release 

Drift 

Emergency  detections 
Energy  Management  Index  (EMI) 

Ground  speed 
Ground  track 
Landing  aim  point 
Landing  attitude 

Ldg .  dist.  to  ideal  touchdown  point 
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TABLE  II  (Continued) 


MAN-MACHINE 
(Cont 'd) 

SYSTEM  OUTPUT 

Ldg.  dist.  to  runway  threshold 

Ldg.  height  at  runway  threshold 
Landing  result  (flare,  bolter,  ...) 
Lateral  acceleration 

Lateral  velocity 

Mach  number 

Maneuvering  rate 

Miss  distance  (ASW) 

Navigational  accuracy  (LAT/LONG) 

Pitch 

Pitch/roll  coordination 

Pointing  angle  advantage 

Position  estimation 

Positional  error  (formations) 

Power 

Prob.  of  finding  turn  point 

Prob.  of  target  acquisition 

Prob.  of  target  detection 

Procedural  errors 

Range  at  target  detection 

Range  a-t  target  identification 

Range  at  target  recognition 

Rate  of  information  processing 

Ratio:  carrier  accidents 

Ratio:  carrier  bolters 

Ratio:  carrier  bolter  rate 

Reaction  to  an  event 

Roll 

Sideslip 

Takeoff  position  error 

Torque 

Tracking  error 

Turn  errors 

Turn  rate 

Vertical  acceleration 

Vertical  velocity 

Yaw 

TIME 

Combined  total  seconds  of  error 
Defensive  time 
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TABLE  II  (Continued) 


TIME  (Cont'd) 

Lead 

time 

Offensive  time 

Offensive  time  with  advantage 

Opponent  out  of  view  time 

Ratio : 

offensive/defensive  time 

Reaction  time  to  an  event 

Time 

Time 

estimation 

Time 

of 

event 

Time 

of 

task  execution 

Time 

on 

criterion 

Time 

on 

target 

Time 

to 

acquire  target 

Time 

to 

criterion 

Time 

to 

detect  target 

Time 

to 

envelope 

Time 

to 

first  kill 

Time 

to 

identify  target 

Time 

to 

recover  from  unusual  attack 

Time 

to 

turn 

Time 

within  criterion 

Time 

within  envelope 

Time 

within  flight  path 

Time 

within  gun  range 

Time 

within  missile  range 

FREQUENCY 

No. 

of 

aircraft  ground  impacts 

No. 

of 

collisions  (formations) 

No. 

of 

control  losses 

No. 

of 

control  reversals 

No. 

of 

correct  decisions 

NO. 

of 

correct  responses 

No. 

of 

correct  target  acquisitions 

No.  ' 

of 

correct  target  classifications 

No.  - 

of 

correct  target  detections 

No.  ■ 

of 

correct  target  identifications 

No.  ■ 

of 

correct  trials 

No.  ■ 

of 

course  corrections 

NO.  ■ 

of 

crossovers 

No.  ■ 

of 

errors  per  trial 
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TABLE  II  (Continued) 


FREQUENCY  (Cont'd) 


No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

No. 

of 

errors  to  criterion 
false  target  detections 
false  target  identifications 
gun  hits/kills 
incorrect  control  inputs 
incorrect  decisions 
lost  target  contacts  (ASW) 
missile  hits/kills 
overshoots 

refueling  disconnects 
qualifying  (criterion)  bombs 
scorable  bombs 

successful  unusual  attack  rec. 

taps  (secondary  task) 

target  detections  (no  fires) 

target  hits 

target  kills 

target  misses 

times  inside  criterion 

times  off  target 

times  outside  criterion 

turn  points  found 

turns  to  assigned  heading 


COMBINED  OVERALL  MEASURES 

Good  Stick  Index  (GSI) 

Landing  Performance  Score  (LPS) 
Objective  Mission  Success  Score  (OMSS) 
Trials  to  criterion 
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useful  forms  of  information  for  the  comparison  or  evaluation 
process.  Since  all  measures  are,  in  effect,  random  variables 
in  a  statistical  sense,  it  is  important  to  determine  the  family, 
or  distribution  population,  that  characterizes  the  particular 
measure  being  examined  before  performing  any  statistical  oper¬ 
ations.  Two  such  example  distributions  would  be  the  Exponential 
(time  measures)  and  the  Normal  or  Gaussian  (accuracy  measures) . 
Each  distribution  has  preferred  statistical  summary  estimators 
that  use  the  generated  measures  (data)  in  order  to  best  esti¬ 
mate  the  actual  or  true  operator  performance  dimension  at  hand, 
b.  Error  Measures 

Accuracy,  or  error  measures,  are  of  special  interest 
in  aircrew  performance  measurement  due  to  the  obvious  relation¬ 
ship  between  operator  error  and  aircraft  accidents.  These 
unique  measures  are  usually  expressed  as  a  measurable  deviation 
of  a  variable  from  an  established  or  arbitrary  reference  point, 
and  have  been  of  great  interest  to  the  aviation  accident  in¬ 
vestigation  community  [Hitchcock,  1966;  Ricketson,  1974]. 
Chapanis  [1951]  dichotomized  errors  as  basically  constant  or 
variable;  constant  errors  indicated  the  difference  between  a 
statistical  estimator  of  a  quantity  and  the  true,  or  expected 
value,  and  variable  errors  are  described  by  a  statistical 
estimator  of  measure  spread  or  dispersion.  That  study  con¬ 
cluded  that  variable  errors  indicated  the  actual  instability 
of  the  man-machine  relationship  and  thus  were  more  of  a  con¬ 
tributing  factor  to  accidents. 
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c.  Aircrew-Aircraft  Measurement 

A  descriptive  structure  for  flight  crew  performance 
measurement  relating  system  performance  and  human  behavior  to 
segments  of  maneuvers  which  constitute  a  flight  mission  was 
recently  developed  by  Vreuls  and  Cotton  [1980] ,  derived  from 
earlier  work  by  Benenatti,  et  al.  [1962],  Knoop  [1968],  and 
Vreuls,  et  al.  [1973] .  The  structure  states  that  a  measure 
must  have  meaning  only  when  specified  fully  by  the  following 
five  determinants : 

(1)  Measure  segment. 

(2)  System  state  variable  (s)  and  their  scaling. 

(3)  Sampling  rates  for  continuous  variables. 

(4)  Desired  values. 

(5)  Transformations. 

Measure  segments  are  any  portions  of  flight  for 
which  the  desired  behavior  of  the  system  follows  a  lawful 
relationship  from  an  unambiguously  defined  beginning  to  end. 
Segments  are  related  closely  to  specific  behavioral  objectives 
(from  the  ISD  training  approach) ,  are  possibly  but  not  required 
to  be  the  same  as  a  maneuver  segment,  and  defined  explicitly 
by  start/stop  logic.  System  state  variables,  as  previously 
discussed  (measures) ,  are  scaled  so  that  their  entire  dynamic 
range  is  represented  without  information  loss.  Scaling  will 
be  discussed  along  with  desired  values  and  transformations 
later  in  this  section.  Sampling  rates  are  the  temporal  fre¬ 
quency  at  which  a  measure  is  recorded  or  observed  by  the 
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measurement  system.  One  sampling  rate  guideline  is  to  record 
data  faster  than  the  natural  frequency  response  for  the  spe¬ 
cific  axis  in  which  the  measurement  is  being  made,  although 
others  are  proposed  [Vreuls  and  Cotton,  1980] . 

3 .  Levels  of  Measurement 

In  examining  the  nature  of  performance  measurement, 
four  levels  can  be  distinguished.  The  level  of  scale  deter¬ 
mines  the  amount  of  information  resulting  from  the  measurement 
and  the  mathematical  and  statistical  operations  that  are  per¬ 
missible  within  that  level.  Table  III  is  adapted  from  Lorge 
[1951] ,  Siegel  [1956] ,  and  Smode,  et  al.  [1962] ,  and  lists 
each  level  and  the  applicable  characteristics  associated  with 
each.  A  brief  description  of  each  level  as  analyzed  by  Smode, 
et  al.  [1962]  is  discussed  below: 

a.  Nominal  measurement  scales  have  units  being 
measured  placed  into  classes  or  categories.  Units  placed  in 
the  same  class  are  considered  equivalent  along  some  dimension 
or  in  some  respect. 

b.  Ordinal  measurement  scales  have  units  assigned  a 
rank  order.  Nominal  categories  are  now  ranked  with  respect 

to  each  other  in  terms  of  an  "amount  possessed"  of  the  quantity 
being  measured,  with  judgements  assessing  the  amount  possessed 
by  the  units  involved.  Rankings  can  be  composed  of  unidimen¬ 
sional  or  multidimensional  variables.  In  the  latter  case,  a 
composite  judgement  or  ordering  is  performed  which  essentially 
places  a  multidimensional  variation  on  a  unidimensional  linear 
scale. 
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TABLE  III:  LEVELS  OF  MEASUREMENT 
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Source:  Lorge  [1951],  Siegel  [1956],  and  Smode,  et  al.  [1962]. 


c.  Interval  measurement  scales  have  units  being 
measured  in  equidistant  terms.  In  addition  to  an  indication 
of  not  only  rank  order  or  direction,  there  is  also  an  indica¬ 
tion  of  the  amount  or  size  of  difference  between  units  on  the 
scale.  Since  the  zero  point  is  usually  arbitrary,  it  does 
not  represent  complete  absence  of  the  property  being  measured. 

d.  Ratio  measurement  scale  is  an  extension  of  an 
interval  scale  with  a  natural,  absolute  zero  point,  and  repre¬ 
sents  the  highest  measurement  level.  It  is  with  this  level 
that  the  most  powerful  statistical  tests  of  significance  are 
available  when  evaluating  performance  measures. 

The  determination  of  a  parameter  or  measure  of  per¬ 
formance  should  include  the  units  of  scaling,  i.e.,  0  to  640 
knots  (air  speed) ,  and  0  to  64000  feet  (pressure  altitude) . 
Without  a  clear  definition  of  the  scaling  units,  the  improper 
use  of  statistical  operations  or  tests  may  occur,  causing  the 
measurement  of  performance  to  provide  irrelevant  or  erroneous 
results . 

4 .  Measure  Transformations 

Measures  are  observed,  recorded,  and  usually  subjected 
to  transformation,  which  is  any  mathematical,  logical,  or 
statistical  treatment  designed  to  convert  raw  measures  into 
usable  information  (Vreuls  and  Cotton,  1980] .  When  measures 
are  discrete,  the  transformation  may  be  the  actual  value, 
absolute  value,  or  a  tolerance  band.  When  measures  are  con¬ 
tinuous,  transformations  may  include  means,  variances. 
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frequency  content,  departures  from  norms,  or  several  others 
as  listed  in  Table  IV. 

The  relationship  between  the  distribution  of  a  measure 
or  estimate  and  transformations  is  well  known  to  statisticians 
but  sometimes  not  very  clear  to  others.  For  a  given  popula¬ 
tion  distribution,  there  exists  unbiased  and  consistent  esti¬ 
mators  (transformations)  that  will  best  describe  the  true 
value  or  quantity  that  is  being  estimated.  Indeed,  some 
transformations  are  not  applicable  for  a  given  population, 
and  in  a  sense  are  useless.  The  interested  reader  is  referred 
to  Mood,  et  al.  [1974]  for  a  detailed  analysis  of  applicable 
transformations  for  a  known  population  distribution. 

5 .  Accuracy  of  Measurement 

Measurement  produces  measures  which  are  sample  esti¬ 
mators  of  the  true  value  or  actual  amount  of  the  quantity 
possessed.  Accuracy  of  measurement  refers  to  how  close  an 
estimator  is  to  the  true  value.  All  measurement  systems  are 
subject  to  accuracy  problems  as  discussed  below. 

a.  Measuring  aircrew  behavior  is  confounded  by  the 
systematic  influence  of  the  total  operating  system,  since  the 
measures  obtained  are  frequently  determined  to  some  extent  by 
the  performance  of  other  components  in  the  system  [Glaser  and 
Klaus,  1966] . 

b.  Any  statistic,  or  known  function  of  observable 
random  variables  that  is  itself  a  random  variable,  whose  values 
are  used  to  estimate  a  true  quantity,  is  an  estimator  of  the 
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TABLE  IV:  COMMON  MEASURE  TRANSFORMATIONS 


TIME  HISTORY  MEASURES 

Time  on  target 
Time  out  of  tolerance 
Percent  time  in  tolerance 
Maximum  value  out  of  tolerance 
Response  time,  rise  time,  overshoot 
Frequency  domain  approximations 

Count  of  tolerance  band  crossing 
Zero  or  average  value  crossings 
Derivative  sign  reversals 
Damping  ratio 

AMPLITUDE-DISTRIBUTION  MEASURES 
Mean,  median,  mode 

Standard  deviation,  variance,  range 
Minimum/maximum  value 
Root-mean- squared  error 
Absolute  average  error 

FREQUENCY  DOMAIN  MEASURES 

Autocorrelation  function 
Power  spectral  density  function 
Bandwidth 
Peak  power 

Low/high  frequency  power 
Bode  plots,  fourier  coefficients 
Amplitude  ratio 
Phase  shift 

Transfer  function  model  parameters 

Quasi-linear  describing  function 
Cross-over  model 

BINARY,  YES/NO  MEASURES 

Switch  activation  sequences 
Segmentation  sequences 
Procedural/decisional  sequences 


Source:  Vreuls  and  Cotton  [1980] . 
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the  true  quantity  and  may  be  subject  to  biased  or  inconsistent 
properties.  For  all  distributions  of  measures  that  have  been 
identified  during  measurement,  there  exists  at  least  one  un¬ 
biased  and  consistent  estimator  that  will  be  closer  to  the 
true  value  of  the  quantity  being  measured  than  any  other 
estimator.  Using  the  incorrect  estimator  for  a  known  popula¬ 
tion  will,  in  effect,  result  in  avoidable  measurement  inac¬ 
curacies  [Krendel  and  Bloom,  1963;  Mood,  et  al.,  1974]. 

There  is  no  simple  way  to  assure  measurement  accuracy, 
but  several  techniques  to  improve  the  accuracy  of  measurement 
may  be  incorporated  and  follow. 

c.  Scope  of  Measurements 

Accuracy  will  increase  as  a  result  of  increasing 
the  inclusiveness  or  completeness  of  the  measures  to  include 
all  relevant  behavior  and  system  components.  Skilled  perfor¬ 
mance  in  an  aircraft  normally  involves  complex  behaviors  that 
require  a  wide  scope  of  measurement  to  accurately  estimate 
the  existing  true  performance  level  [Smode,  et  al.,  1962], 

d.  Number  of  Measurements 

For  most  situations,  the  greater  the  number  of 
observations  involved  in  deriving  an  estimator,  the  closer 
the  estimator  wil  be  to  the  true  value  [Mood,  et  al.,  1974]. 
Increasing  the  number  of  observations  when  there  is  a  large 
variability  in  the  individual  measurements  will  reduce  the 
variability  and  minimize  the  effects  of  random  or  chance 
variations  which  may  occur  from  measurement  to  measurement 
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[Smode,  et  al.,  1962].  Large  sample  sizes  also  are  desirable 
when  establishing  standards  or  references  (criteria)  when 
evaluating  performance  [Krendel  and  Bloom,  1963]  . 

e.  Controlled  Conditions  of  Measurement 

By  insuring  the  desired  measurement  conditions 
are  controlled,  accurate  measurement  may  be  improved.  This 
may  be  done  by  defining  those  factors  which  are  to  be  present 
and  varied  systematically,  maintaining  uniformity  of  condi¬ 
tions  during  measurement  in  order  to  reduce  bias  and  unwanted 
variability,  and  insuring  the  intended  measurements  are  taken 
correctly  [Smode,  et  al.,  1962]. 

6 .  Reliability  of  Measurement 

Reliability  refers  to  the  accuracy  of  measurement  or 
the  consistency  or  stability  of  the  recorded  and  statistical 
data  upon  repetition  [Glaser  and  Klaus,  1966;  Knoop  and  Welde, 
1973;  Grodsky,  1967;  Thorndike,  1951].  When  the  dispersion, 
or  spread,  of  measures  obtained  from  one  individual  on  a  par¬ 
ticular  task  is  large,  the  measures  lack  reliability,  and 
any  statistics  that  are  formed  from  the  measures  that  are  used 
in  evaluation  will  be  incapable  of  differentiating  consistently 
among  individuals  who  are  at  different  skill  levels.  If  the 
measures  are  precise,  resulting  in  a  statistic  that  is  stable, 
an  individual  would  receive  exactly  the  same  evaluation  score 
each  time  his  performance  was  measured  [Glaser  and  Klaus,  1966; 
Thorndike,  1951] . 
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a.  Computation  of  Reliability 

Thorndike  [1951]  cautioned:  "There  is  no  single, 
universal,  and  absolute  reliability  coefficient  for  a  test. 
Determination  of  reliability  is  as  much  a  logical  as  a 
statistical  problem."  Several  methods  of  approximating 
reliability  have  since  been  proposed  or  utilized  in  aircrew 
measurement.  Smode,  et  al.  [1962]  classified  reliability 
expressions  as  either  absolute  or  relative  and  suggested 
using  the  standard  error  of  measurement  (absolute  measure  of 
precision) ,  coefficient  of  internal  consistency  (uses  single 
set  of  observations) ,  coefficient  of  stability  (measure  agree¬ 
ment  over  time) ,  and  the  coefficient  of  equivalence  (agreement 
between  two  like  measures)  as  statistical  computations  of 
reliability.  Glaser  and  Klaus  [1966]  dichotomized  reliability 
assessment  into  two  methods:  test-retest  and  alternate  form, 
and  recommended  computing  reliability  by  using  the  statistical 
correlation  coefficient.  Some  recent  statistical  techniques  - 
Winsorization  and  trimming  -  may  provide  a  better  reliability 
approximation  than  was  previously  possible.  Winsorization 
and  trimming  involve  removing  the  effects  of  a  large  varia¬ 
bility  in  a  measure  sample,  with  virtually  no  loss  in  infor¬ 
mation  [Dixon  and  Massey,  1969] .  These  techniques  would  appear 
to  be  quite  useful  for  reliability  calculations,  although  their 
actual  use  in  aircrew  performance  measurement  has  not  yet  been 
demonstrated . 
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b.  Sources  of  Unreliability 

The  sources  of  measurement  accuracy  problems,  as 
previously  discussed,  are  also  sources  of  unreliability. 

Other  sources  inherent  in  the  measurement  of  human  behavior, 
which  hinder  reliable  measurement  of  aircrew  performance, 
are  from  Glaser  and  Klaus  [1966]  and  Thorndike  [1951]  ,  and 
include  the  following: 

(1)  The  environment  in  which  performance  is 
being  measured  influences  measurement  variability.  Differ¬ 
ences  in  weather,  amount  of  turbulence,  wind  direction  and 
velocity,  unexpected  noise,  equipment  malfunction,  and  extreme 
temperatures  are  factors  contributing  to  unreliability. 

(2)  Equipment  used  for  measurement  or  personnel 
participating  in  performance  assessment  are  sources  of  unre¬ 
liability. 

(3)  The  complexity  of  the  behavior  being  evalu¬ 
ated  influences  unreliability.  Since  the  behavior  being 
measured  and  evaluated  involves  many  dimensions  of  performance, 
and  any  individual's  skill  level  may  fluctuate  considerably 
from  one  dimension  to  the  next,  each  component  dimension  is 
susceptible  to  all  previously  discussed  sources  of  unrelia¬ 
bility  that  have  been  described  above. 

(4)  The  performance  of  the  individual  being 
assessed  fluctuates  as  a  function  of  temporary  variations  in 
the  state  of  the  organism.  Some  factors  frequently  involved 
that  decrease  measurement  reliability  are:  individual 
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motivation,  emotional  state,  fatigue,  stress,  test- taking 
ability,  and  circadian  rhythm. 

c.  Improving  Reliability  of  Measurement 

In  any  training  situation,  some  degree  of  reli¬ 
ability  in  the  measure  of  the  ability  being  trained  is  neces¬ 
sary  if  any  evidence  of  improvement  is  required.  By  reducing 
chance  factors  or  variability  in  measurement,  reliability  can 
be  improved.  The  techniques  for  improving  the  accuracy  of 
measurement,  as  previously  discussed,  also  contribute  towards 
improving  reliability.  Knoop  and  Welde  [1973]  suggested  other 
factors  to  improve  reliability  that  are  listed  below: 

(1)  Calibration  of  performance  measurement  equipment 
should  be  conducted  on  a  continuing  basis. 

(2)  Software  processes  involved  in  data  collection, 
reduction,  conversion,  analysis,  and  plotting 
should  be  validated  and  monitored  to  avoid  data 
loss . 

(3)  Accurate  records  of  flight  conditions  and  mission 
requirements  should  be  maintained  to  facilitate 
measurement  interpretation. 

7 .  Validity  of  Measurement 

Validity  is  the  degree  to  which  the  measurement  or 
evaluation  process  correctly  measures  the  variable  or  property 
intended  to  be  measured  [Knoop  and  Welde,  1973] .  In  regards 
to  the  evaluation  process,  validity  has  two  aspects:  (1) 
relevance  or  closeness  of  agreement  between  what  the  test 
measures  and  the  function  that  it  is  used  to  measure,  and 
(2)  reliability  or  the  accuracy  and  consistency  with  which 
the  test  measures  whatever  it  does  measure  in  the  group  with 
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which  it  is  used  [Cureton,  1951] .  In  the  training  environ¬ 
ment,  a  performance  test  is  a  stimulus  situation  constructed 
to  evoke  the  particular  kinds  of  operator  behavior  to  be 
measured  or  assessed.  The  validity  of  a  performance  test  is 
established  by  demonstrating  that  the  test  results  reflect 
differences  in  skill  levels  of  the  performance  being  assessed 
[Glaser  and  Klaus,  1966]. 

a.  Types  of  Validity 

Four  types  of  validity  which  have  applicability 
to  performance  measurement  in  general  have  been  described  by 
Smode,  et  al.  [1962],  McCoy  [1963],  and  Chiles  [1977]  as 
follows : 

(1)  Predictive  validity  refers  to  the  correlational 
agreement  between  obtained  measures  and  future 
status  on  some  task  or  dimension  external  to  the 
measurement  and  requires  statistical  operations 
for  evaluation. 

(2)  Concurrent  validity  refers  to  the  correlational 
agreement  between  obtained  measures  and  the  present 
status  of  the  units  being  measured  on  some  task  or 
dimension  external  to  the  measurement  and  also 
requires  statistical  computation. 

(3)  Content  validity  is  based  on  expert  opinion  and  is 
evaluated  by  qualified  people  determining  if  the 
measures  to  be  taken  truly  sample  the  types  of 
performance  or  subject  matter  about  which  conclu¬ 
sions  will  be  drawn. 

(4)  Construct  validity  is  a  logical  process  where  the 
emphasis  is  on  trait,  quality,  or  ability  presumed 
to  underlie  the  measures  being  taken.  While  the 
measures  themselves  do  not  reflect  directly  the 
performance  to  be  evaluated,  they  are  accepted  to 
be  valid  on  the  basis  of  logical  considerations  and 
related  empirical  evidence. 
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b.  Validity  of  Measurement  in  the  Simulator 
Without  empirical  or  judgmental  evidence,  the  use 

of  full-scale  state-of-the-art  flight  simulation  provides 
maximum  face  validity,  where  the  performance  evaluation  sit¬ 
uation  in  the  simulator  appears  to  duplicate  the  actual  task 
of  flying  or  navigating  an  aircraft  [Alluisi,  1967;  Chiles, 
1977] .  Bowen,  et  al.  [1966] ,  in  an  experiment  using  twenty 
A- 4  pilots  to  study  and  assess  pilot  proficiency  in  an  Oper¬ 
ational  Flight  Trainer  (OFT;  device  2F62),  found  that: 

For  measures  taken  in  the  OFT  to  be  valid,  the  task  set 
to  the  pilot  should  be  multiple  in  nature  and  have  a 
considerable  difficulty  level  equivalent  to  the  more 
difficult  levels  found  in  actual  flight.  In  this  man¬ 
ner,  the  pilot  is  more  likely  to  display  his  skills  in 
the  same  pattern  of  priority  (i.e.,  time-sharing, 
attention-shifting,  standard-setting,  etc.)  as  he  does 
in  actual  flight. 

This  conclusion  is  also  supported  by  Kelley  and  Wargo  [1968] 
and  in  terms  of  the  relevance  component  of  validity  as  pre¬ 
viously  discussed  by  Cureton  [1951]  . 

c.  Improving  Validity  of  Measurement 

A  high  degree  of  validity  is  essential  to  the 
effectiveness  of  any  measurement  system.  Improving  validity 
can  be  achieved  by  increasing  either  or  both  relevance  and 
reliability.  Relevance  may  be  increased  by  reproducing  the 
simulation  situation  to  closely  resemble  that  of  the  actual 
aircraft  itself,  or  by  simulating  the  task  or  mission  being 
performed  more  closely  to  the  actual  task  or  mission  environ¬ 
ment.  Reliability  improvements  were  discussed  in  the  previous 
section. 
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d.  Relationship  of  Validity  and  Reliability 

Validity  has  been  described  as  having  aspects  of 
relevance  and  reliability.  Reliability  is  the  consistency 
or  self-correlation  of  a  measurement  while  validity  is  its 
correlation  with  some  independent  standard  or  reference  from 
that  which  is  measured  [Kelley  and  Wargo,  1968] .  A  given 
performance  measurement  can  be  highly  reliable  yet  not  have 
validity  [Smode,  et  al.,  1962;  Kelley  and  Wargo,  1968],  How¬ 
ever,  an  unreliable  test  cannot  have  practical  validity,  i.e., 
a  measurement  that  does  not  even  correlate  well  with  itself 
will  not  correlate  well  with  other  measurements  [Kelley  and 
Wargo,  1968;  Steyn,  1969] .  In  performance  measurement,  high 
validity  must  be  combined  with  high  reliability;  this  combin¬ 
ation  means  that  a  highly  skilled  operator  consistently  must 
achieve  a  higher  performance  evaluation  result  than  a  less 
skilled  operator  [Smode,  et  al.,  1962;  Kelly,  et  al . ,  1979]. 

If  the  performance  evaluation  occurs  during  the  actual  task 
instead  of  a  simulated  task,  the  question  of  validity  reduces 
simply  to  the  question  of  reliability,  as  perfect  relevance 
will  have  been  achieved  [Cureton,  1951] . 

Because  of  the  unique  relationship  of  validity 
and  reliability,  it  is  generally  easier  and  more  efficient  to 
improve  the  reliability  of  a  measure  than  to  raise  its  validity. 
On  the  other  hand,  if  the  validity  of  a  measure  appears  promis¬ 
ing,  improving  reliability  is  preferred  to  using  a  reliable 
measure  which  has  lower  validity  [Glaser  and  Klaus,  1966] . 
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8.  Selection  of  Initial  Measures 


After  the  identification  and  selection  of  a  desired 
mission,  scenario,  flight  segment,  and  human  operator  with 
the  behavior  of  interest,  performance  associated  with  these 
requirements  can  be  specified  and  defined.  The  initial 
selection  of  appropriate  measures  has  been  a  major  problem, 
as  evidenced  by  the  lack  of  concordance  in  recent  aircrew 
performance  measurement  research  [Mixon  and  Moroney,  1981] . 
Unless  aircrew  performance  measurement  empirical  results  are 
collected  and  standardized,  an  analytical  approach  must  be 
taken  when  initially  selecting  which  measures  best  describe 
the  performance  being  examined.  Some  optimum  balance  exists 
between  the  "measure  everything  that  moves"  philosophy  and 
the  measurement  of  a  few  measures  with  apparent  face  validity 
The  initial  selection  of  any  measures,  however,  should  be 
guided  by  both  the  purpose  of  the  measurement  and  the  man- 
machine  system  as  well  as  the  facility  for  collecting  and 
processing  the  data.  The  following  criteria  for  initial 
measure  selection  are  provided  by  Meister  and  Rabideau  [1965] 
Parsons  [1972] ,  Greening  [1975] ,  and  Vreuls  and  Wooldridge 
[1977]  : 

(1)  Relevance  -  the  measures  should  be  pertinent  to  the 
purpose  of  measurement. 

(2)  Quantifiable  -  measures  should  be  in  the  form  of 
numerals  on  a  ratio  scale  for  statistical  analysis, 
except  where  only  subjective  collection  is  feasible. 

(3)  Accessibility  -  measures  should  not  only  be  observ¬ 
able  but  easily  collectable,  preferably  by  electronic 
means . 
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(4)  Operational  utility  -  a  measure  that  has  relevance 
and  accessibility  in  both  the  aircraft  and  simulator 
environments . 

(5)  Efficiency  -  a  measure  with  utility  at  minimum  cost 
of  collection  and  transformation  into  usable  infor¬ 
mation. 

(6)  Content  validity  -  a  positive  correspondence  between 
the  performance  measure  and  what  is  known  about  the 
underlying  behavior. 

(7)  Reliability  -  collection  of  more  measures  or 
samples  of  a  measure  than  planned  would  offset  the 
likelihood  of  electronic  data  collection  equipment 
failures . 

(8)  Dependence  -  the  availability  of  human  or  automatic 
data  observers  and  collectors  limit  what  measures 
are  feasible  to  collect. 

(9)  Objectivity  -  where  possible,  automatic  data  obser¬ 
vation  and  recording  is  preferred  to  human  observers 
and  recorders. 

(10)  Usable  -  measures  collected  should  be  usable  for 
either  evaluation  information  or  supportive  to 
evaluation  results. 

(11)  Acceptable  -  measures  that  are  used  by  instructors 
in  the  operational  environment  must  be  consistent 
with  their  expert  judgement. 

(12)  Collection  criteria  -  data  must  be  accurate  and 
precise  to  what  is  known  about  the  underlying 
behavior . 

Once  the  performance  measures  of  interest  are  identified  and 
selected,  they  should  be  defined,  as  a  minimum,  in  terms  of 
the  descriptive  structure  as  previously  outlined  by  Vreuls 
and  Cotton  [1980] .  Some  consolation  in  not  selecting  the 
appropriate  measures  for  performance  is  offered  by  Knoop 
[1968] ,  "Determining  exact  criteria  [standards]  and  perfor¬ 
mance  measures  for  virtually  any  flight  task  or  mission  is 
an  accomplishment  yet  to  be  made." 
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B.  CRITERIA  CONSIDERATIONS 

1 .  Definition  and  Purpose  of  Criteria 

Criteria  are  standards,  rules,  or  tests  by  which 
measures  of  system  behavior  are  evaluated  in  terms  of  success 
or  failure,  or  to  some  degree  of  success  or  failure.  The 
purpose  of  human  performance  criteria  is  to  provide  standards 
or  baselines  for  evaluating  the  success  or  failure,  goodness 
or  badness,  or  usefulness  of  human  behavior  [Knoop,  1968; 
Vreuls  and  Cotton,  1980;  Cureton,  1951;  Steyn,  1969;  Davis 
and  Behan,  1966;  Shipley,  1976;  Buckhout  and  Cotterman,  1963]. 
The  criterion  is  a  measuring  device  which  is  not  generally  or 
readily  available,  but  a  device  which  should  be  constructed 
from  the  beginning  for  each  particular  situation  [Steyn,  1969; 
Christensen  and  Mills,  1967].  Criteria  should  not  only  define 
the  unique  manner  in  which  the  operator  should  perform  a  task, 
but  should  define  the  performance  objectives  of  the  entire 
man-machine  system  [Demaree  and  Matheny,  1965;  Connelly,  et 
al. ,  1974]  . 

2 .  Types  of  Criteria 

The  classification  of  criteria  can  be  accomplished 
from  a  measurement  standpoint;  begin  with  the  smallest  known 
entity  and  end  with  the  "ultimate"  quantity  that  may  exist. 
Several  types  identified  are  listed  below: 

(1)  Parametric  referent  or  standard  of  performance 

which  is  sought  to  be  met  by  the  operator  or  system. 
Example:  maintain  500  feet  of  altitude  [Demaree 

and  Matheny,  1965;  Shipley,  1976] . 
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(2)  Parametric  limit  about  the  parametric  standard 
within  which  the  operator  or  system  is  required, 
or  seeks,  to  remain.  Example:  maintain  plus  or 
minus  100  feet  while  at  500  feet  altitude  [Demaree 
and  Matheny,  1965;  Shipley,  1976]. 

(3)  System  component  criteria  which  distinguishes  the 
relationship  between  system  components  and  system 
output.  Example:  "least  effort"  measured  from 
the  pilot  in  relation  to  maintaining  altitude 
[Krendel  and  Bloom,  1963;  Uhlaner  and  Drucker, 

1964] . 

(4)  Test  criterion  used  to  evaluate  overall  human 
ability,  usually  expressed  as  a  single  overall 
measure.  Example:  subjective  judgement  of  in¬ 
structor  for  a  student  as  to  "pass"  or  "fail" 

[Marks,  1961] . 

(5)  Ultimate  criteria  are  multidimensional  in  nature 
and  represent  the  complete  desired  end  result  of 

a  system.  This  type  of  criterion  is  impossible  to 
quantify  due  to  the  multidimensional  nature  of  the 
system's  purpose,  and  hence,  is  a  theoretical  entity 
that  must  be  approximated.  Example:  Any  aircraft's 
mission  [Cureton,  1951;  Smode,  et  al . ,  1962;  Uhlaner 
and  Drucker,  1964;  Steyn,  1969;  Shannon,  1972]. 

It  may  be  noted  that  all  five  types  of  criteria  can 
be  quantified  or  approximated  in  some  manner,  with  decreasing 
accuracy  as  the  ultimate  criteria  level  is  reached.  Obtain¬ 
ing  direct  measures  of  the  ultimate  criteria  for  a  complex 
system  is  seldom  feasible.  This  is  particularly  true  in  mil¬ 
itary  systems  where  such  criteria  would  be  expressed  in  terms 
of  combat  effectiveness  or  effectiveness  in  preventing  a 
potential  aggressor  from  starting  a  conflict  [Smode,  et  al., 
1962] .  Therefore,  it  becomes  apparent  that  we  must  select 
intermediate  criteria  (types  one  through  four  above)  in 
evaluating  skilled  operator  behavior. 
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3.  Characteristics  of  Good  Criteria 


Using  actual  criteria  as  approximations  of  the  ulti¬ 
mate  criteria  can  be  accomplished  by  several  methods  that  will 
be  discussed  in  a  later  section.  Although  there  is  no  certain 
method  that  will  lead  to  the  specification  of  good  criteria, 
there  are  some  considerations  that  can  be  taken  into  account 
which  are  discussed  below: 

(1)  A  good  criterion  is  both  reliable  and  relevant 
[Smode,  et  al.,  1962;  Krendel  and  Bloom,  1963; 

Cureton,  1951;  Grodsky,  1967;  Steyn,  1969], 

(2)  Criteria  must  be  comprehensive  in  that  the  utility 
of  the  individual  being  evaluated  is  unambiguously 
reflected  [Steyn,  1969] . 

(3)  Criteria  should  possess  selectivity  and  have  ready 
applicability  [Krendel  and  Bloom,  1963]  . 

4 .  Other  Criteria  Characteristics 

Steyn  [1969],  in  a  review  of  criterion  studies,  noted 
that  performance  measures  under  simulated  conditions  can  at 
best  serve  as  substitute  criteria.  This  observation  reflects 
the  engineering  and  mathematical  model  of  reality  represented 
by  the  simulator  that  can,  at  best,  approximate  an  aircraft 
and  its  systems.  Shipley  and  Gerlach  [1974]  measured  pilot 
performance  in  a  flight  simulator  (T4-G)  and  found  that  dif¬ 
ferences  in  pilot  performance  outcomes  varied  as  a  function 
of  the  difference  in  criterion  limits  that  were  established, 
with  the  relationship  between  criterion  limits  and  tracking 
performance  found  to  be  a  nonlinear  one.  The  multidimension— 
ality  of  a  criterion  was  also  noted  by  Steyn  [1969] ,  Cureton 
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[1951],  and  Connelly,  et  al.  [1974].  These  latter  two  studies 
observed  that  multiple  criteria  must  exist  for  a  single  task 
since  different  operator  action  patterns  having  no  single 
feature  in  common  could  conceivably  obtain  the  same  desired 
system  output. 

5 .  Establishment  of  Criteria 

Criteria  may  either  be  derived  from  regulatory  require¬ 
ments,  system  operating  limits,  knowledge  of  common  practice, 
or  empirical  studies  [Vreuls  and  Cotton,  1980] .  When  criteria 
are  established  analytically,  some  caution  must  be  taken. 
Campbell,  et  al.  [1976] ,  in  designing  the  A-6E  TRAM  training 
program  using  ISD  methodology,  observed  that: 

A  standard  or  criterion  of  performance  for  that  terminal 
behavior  must  also  be  established  .  .  .  at  a  level  com¬ 

parable  to  the  earlier  established  operational  standards. 
These  latter  criteria,  however,  while  reflecting  an 
acceptable  level  of  behavior,  imply  a  repeatability, 
that  .  .  .  whenever  they  are  performed,  that  some 
acceptable  level  will  be  attained. 

Criteria  derived  from  objective,  empirical  techniques  are 
preferable  to  analytical  methods  [Steyn,  1969].  Regression 
analysis,  discriminant  function  analysis,  multivariable  re¬ 
gression,  and  norm  or  group  referencing  are  but  a  few  of  the 
empirical  approaches  to  establishing  criteria  [Connelly,  et 
al./  1974;  Danneskiold,  1955;  Dawes,  1979].  No  matter  which 
method  is  used,  criteria  must  be  defined  and  are  necessary 
for  the  evaluation  process. 
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6.  Sources  of  Criterion  Error 


As  previously  mentioned,  a  good  criterion  is  one  that 
is  both  reliable  and  relevant.  Reliability,  as  previously 
defined,  implies  that  what  constitutes  successful  performance 
will  be  resistant  to  the  effects  of  chance  factors.  Relevancy 
refers  to  the  validity  of  the  actual  or  approximated  criterion 
to  the  ultimate  criterion.  By  definition,  the  ultimate  cri¬ 
terion  is  completely  relevant.  Sources  of  criterion  error 
can  then  be  identified  in  terms  of  reliability  and  relevance. 
Smode,  et  al.  [1962]  ,  lists  some  significant  sources  of 
criterion  error  below: 

(1)  Low  reliability,  as  previously  mentioned. 

(2)  Irrelevancy  or  the  lack  of  relation  of  the  actual 
criterion  with  respect  to  the  ultimate  or  "ideal" 
criterion. 

(3)  Contamination  of  the  criterion  by  the  presence  of 
factors  or  ingredients  in  the  actual  criterion 
which  do  not  in  fact  comprise  the  ultimate  criterion. 

(4)  Distortion  in  the  criterion  caused  by  errors  arising 
from  assigning  incorrect  weights  to  the  separate 
factors  that  comprise  the  actual  criterion  (com¬ 
bining  criteria  is  discussed  below) . 

7 .  Measures  of  Effectiveness 

Aircraft  missions  are  all  multidimensional  in  nature. 
This  means  that  every  mission  can  be  divided  into  usually  one 
overall  goal  or  purpose  (i.e.,  destroy  the  target,  deliver 
the  supplies,  rescue  the  survivors,  etc.),  with  several  sub¬ 
goals  (safety,  minimize  susceptibility,  timeliness,  etc.). 
Since  missions  are  multidimensional,  the  operator  effort  in 
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the  form  of  mental  and  physical  action  (performance)  becomes 
multidimensional.  The  multidimensional  nature  of  skilled  air¬ 
crew  performance,  in  turn,  requires  that  several  criteria,  all 
of  which  are  relevant  for  a  particular  activity,  be  defined 
and  used  [Smode,  et  al.,  1962],  Each  of  these  criteria  must 
be  operationally  defined,  theoretically  quantifiable,  and 
collectively  give  a  reasonable  portrayal  of  operator  and  system 
performance.  Typically,  one  may  wish  to  bring  these  component 
criteria  together  in  an  overall  effectiveness  measure  -  a  single 
"measure  of  effectiveness"  for  the  system  being  investigated. 

The  process  of  combining  criteria  into  a  single  composite 
measure  of  effectiveness  is  one  of  the  most  difficult  tasks 
to  undertake  in  any  field  of  research,  and  has  been  the  focus 
of  continuous  investigation  in  the  science  of  Operations  Re¬ 
search  for  decades  [Hitch,  1953;  Morris,  1963;  Steyn,  1969; 
Lindsay,  1979;  Dawes,  1979] . 

A  Measure  of  Effectiveness  (MOE)  is  a  quantifiable 
measure  used  to  compare  the  effectiveness  of  the  alternatives 
in  achieving  the  objective,  and  must  measure  to  what  degree 
the  actual  objective  or  mission  is  achieved  [Operations  Com¬ 
mittee,  Naval  Science  Department,  1968] .  For  the  particular 
situation  of  measurement  and  evaluation  of  an  aircrew  member's 
performance  in  an  aircraft,  criteria  can  be  thought  of  as 
"alternatives,"  each  of  which  has  an  individual  effectiveness 
for  each  performance  component  of  an  aircrew  member,  who  is 
accomplishing  an  overall  objective  -  the  mission.  MOE's  have 
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been  applied  in  economics,  management,  and  military  problems 
-  just  about  any  area  where  a  decision  based  on  information 
from  system  performance  has  to  be  made. 

Combining  criteria  into  an  MOE  can  be  accomplished 
either  analytically  or  statistically.  Most  methods  concen¬ 
trate  on  assigning  weights  that  are  either  determined  on  the 
basis  of  "expert"  opinion  or  statistical  treatment  [Smode, 
et  al.,  1962;  Steyn,  1969],  In  a  review  consisting  of  numer¬ 
ous  criterion  weighting  studies,  Steyn  [1969]  concluded  that 
"it  would  appear  that  the  most  acceptable  approach  [to  weight¬ 
ing  criterion  variables]  would  be  to  identify  the  job  dimen¬ 
sions  clearly  and  unambiguously  and  to  use  these  pure  dimensions 
as  criteria  to  be  predicted  independently." 

There  is  no  established  procedure  for  combining  cri¬ 
teria  into  a  single  overall  MOE.  Lindsay  [1979]  and  Smode, 
et  al.  [1962]  offer  some  suggestions  to  approach  the  problem: 

(1)  Look  at  the  big  picture.  Examine  what  is  to  be 
done  with  the  results  of  the  aggregation.  Determine 
how  the  numbers  will  be  used,  and  in  what  decisions. 
(Usually  one  finds  that  this  has  not  been  thought 
out  in  advance.)  It  may  be  that  all  that  is  really 
needed  is  the  identification  of  satisfactory  systems. 

(2)  If  possible,  aggregate  subjectively.  Give  the  sub¬ 
criteria  values  to  the  decision-makers  or  their 
advisers  and  let  them  subjectively  determine  how 
effective  the  systems  are. 

(3)  Recognize  that  one  is  defining,  not  approximating. 

The  development  of  a  formal  scoring  system  should 
be  done  with  the  awareness  that  a  definition  of 
system  effectiveness  is  being  made.  The  procedure 
developed  should  include  reference  points,  dimin¬ 
ishing  marginal  returns,  and  avoid  substitutability 
except  where  appropriate. 
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(4)  Sub-criteria  should  be  weighted  in  accordance  with 
their  relevance  to  the  ultimate  criterion. 

(5)  Sub-criteria  which  repeat  or  overlap  factors  in 
other  sub-criteria  should  receive  a  low  weight. 

(6)  Other  things  being  equal,  the  more  reliable  sub¬ 
criteria  should  be  given  greater  weight. 

The  unique  situation  of  an  aircrew  flying  an  aircraft  for  a 
specific  mission  and  the  necessary  determination  of  sub¬ 
criteria  for  evaluating  the  overall  accomplishment  of  that 
mission  requires  further  research  of  an  analytical  and  empir¬ 
ical  nature.  The  relationship  among  altitude,  airspeed, 
operator  activity,  and  the  hundreds  of  other  system  variables 
that  comprise  the  total  system  must  be  compared  to  mission 
success  in  quantifiable  terms.  Since  "mission  success"  is 
multidimensional  and  may  not  be  totally  measurable  and  quan¬ 
tifiable,  some  analytical  approaches  toward  combining  criteria 
into  an  overall  MOE  appear  to  be  feasible,  notwithstanding  the 
possible  use  of  empirical  methods  to  describe  some  aspects  of 
the  process. 

8 .  Selection  of  Criteria 

Criteria  have  been  defined,  their  purpose  established, 
some  types  identified,  and  some  characteristics  discussed. 
Since  a  large  number  of  criteria  may  exist  to  evaluate  a  par¬ 
ticular  system,  some  selection  in  the  way  of  "trade-offs"  may 
be  necessary  [Davis  and  Behan,  197 6 J .  More  than  one  criterion 
may  describe  the  same  dimension  of  performance  whereas  another 
carefully  selected  criterion  may  accurately  describe  more  than 
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one  performance  dimension.  Reducing  the  number  of  criteria 
that  are  relevant,  reliable,  and  practical  into  a  feasible 
and  usable  set  that  can  accurately  and  consistently  evaluate 
the  performance  of  an  aircrew  and  the  accomplishment  of  their 
mission  is  extremely  difficult  at  present  and  will  probably 
remain  as  an  unsolved  future  problem  in  the  Human  Factors 
field  unless  specific  research  is  undertaken  to  attack  it. 

In  the  meantime,  some  general  guidelines  for  selecting  cri¬ 
teria  as  discussed  by  Hitch  [1953]  and  Smode,  et  al.  [1962] 
are  listed  below: 

(1)  Selection  of  any  criteria  should  always  be  consis¬ 
tent  with  the  highest  level  or  type  of  criterion 
associated  with  the  system  mission. 

(2)  Specify  the  activity  in  which  it  is  desired  to 
determine  successful  and  skillful  performance. 

(3)  Consider  the  activity  in  terms  of  the  purpose  or 
goals,  the  types  of  behaviors  and  skills  that  seem 
to  be  involved,  the  relative  importance  of  the 
various  skills  involved,  and  the  standards  of 
performance  which  are  expected. 

(4)  Identify  the  elements  that  contribute  to  succes- 
ful  performance  and  weight  these  elements  in  terms 
of  their  relative  importance. 

(5)  Develop  a  combined  measure  of  successful  performance 
composed  of  sub-criteria  that  measure  each  element 
of  success  and  are  weighted  in  accordance  with  the 
relative  importance  of  each. 

The  definition,  computation,  combining  and  selection 
°f  criteria  is  perhaps  the  most  difficult  problem  encountered 
by  researchers  investigating  complex  man-machine  systems 
[Osborn,  1973;  Krendel  and  Bloom,  1963].  The  importance  of 
criteria  is  once  more  emphasized,  as  stated  by  McCoy  [1963], 
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"You  must  define  the  criterion  precisely  and  accurately 

before  interpreting  any  measures  used  in  investigating  a 

system."  Christensen  and  Mills  [1967],  in  quoting  earlier 

work  done  by  H.R.  Leuba,  stated: 

There  are  many  ludicrous  errors  in  quantification  as 
it  is  practiced  today,  but  none  quite  as  foolish  as 
trying  to  quantify  without  a  criterion.  It  is  awkward 
enough  to  quantify  the  wrong  thing  when  a  criterion 
exists,  but  it  is  a  sham  of  the  most  unprofessional 
sort  to  quantify  in  the  absence  of  a  criterion. 

C.  PERFORMANCE  MEASUREMENT  CONSIDERATIONS 
1 .  Subjective  Versus  Objective  Measures 

As  previously  discussed  in  Chapter  I,  subjective  and 
objective  measures  are  not  dichotomous,  but  rather  represent 
a  continuum  of  performance  measurement.  At  one  extreme  of 
the  continuum,  a  human  observer  mentally  records  actual  per¬ 
formance  during  a  specified  mission,  and  uses  his  perceptions 
to  form  a  judgement  or  degree  of  success  rating  as  to  how 
skillful  the  operator  was  in  achieving  the  system  objectives. 
This  extreme  is  the  subjective  method  of  measurement  and 
evaluation.  At  the  other  extreme  of  the  performance  measure¬ 
ment  continuum,  automatic  digital  computers  sense,  record, 
transform,  analyze,  and  compare  actual  man  and  system  perfor¬ 
mance  to  statistically  established  criteria  and  form  a  complete 
set  of  performance  data  or  information  to  be  used  by  the 
decision-maker  (instructor  or  training  officer)  in  evaluating 
the  skills  and  abilities  of  an  operator.  This  other  extreme 
is  the  objective  method  of  measurement  and  evaluation. 
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Each  method  of  performance  measurement  has  advantages 
and  disadvantages,  as  were  discussed  previously,  and  will  not 
be  repeated  here.  Objective  measures  and  measurement  have 
become  more  feasible  and  less  costly  to  implement  for  aircrew 
performance  than  in  previous  decades,  and  the  method  has 
established  itself  as  a  very  powerful  and  useful  model  for 
describing  actual  human  behavior  [Mixon  and  Moroney,  1981] . 

2 .  Combining  Measures 

What  has  been  discussed  previously  with  respect  to 
combining  criteria  also  applies  to  combining  performance 
measures  into  a  single  overall  index  of  skill  level  or  pro¬ 
ficiency.  As  Smode,  et  al .  [1962]  indicated: 

(1)  Measures  should  be  weighted  in  accordance  with 
their  relevance  to  the  criterion. 

(2)  Measures  which  repeat  or  overlap  factors  included 
in  another  measure  should  receive  a  low  weight. 

(3)  Other  things  being  equal,  the  more  reliable 
measures  should  be  given  greater  weight. 

In  combining  performance  measures,  it  is  often  possible  to 
determine  quantitatively  the  interrelationships  among  the 
performance  measures  and  the  relationship  between  each  measure 
and  the  actual  or  immediate  criterion  [Smode,  et  al.,  1962]. 

A  single  overall  measure  or  score  composed  of  numerous  per¬ 
formance  measures  along  different  dimensions  of  system  behavior 
is  highly  desirable  in  any  performance  measurement  and  evalu¬ 
ation  system,  due  to  the  use  of  the  total  score  in  determining 
overall  performance  when  compared  to  a  criterion.  A  single 
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score  or  estimate  of  total  performance,  when  compared  to  a 
criterion  or  MOE,  provides  the  necessary  information  for 
evaluation  that  determines  goodness  or  badness,  success  or 
failure,  and  usefulness  of  human  performance  [Buckhout  and 
Cotterman,  1963] . 

Combining  performance  measures,  like  criteria,  can  be 
performed  by  either  analytical  or  empirical  methods  which 
commonly  assign  weights  to  each  measure  which  are  then  mathe¬ 
matically  combined  into  a  single  proficiency  score.  Analytical 
methods  employ  the  judgement  of  experts  for  situations  usually 
involving  complex  man-machine  systems  where  definitive  and 
quantifiable  measures  of  output  are  not  available  [Glaser  and 
Klaus,  1966;  Marks,  1961].  Empirical  methods  of  combining 
aircraft  system  performance  measures  with  relative  weightings 
into  a  single  score  were  reviewed  by  Vreuls  and  Obermayer 
[1971] .  Among  some  of  the  methods  from  that  study  for  devel¬ 
oping  multidimensional  algorithms  were:  factor  analysis, 
multiple  discriminate  analysis,  linear-weighted  algorithm, 
nonlinear  (threshold)  model,  energy  maneuverability  model, 
time  demand,  recursion  models,  and  empirical  curve  fit.  The 
interested  reader  is  referred  to  that  study  for  more  detail 
on  each  model  and  the  circumstances  in  which  it  was  employed. 

In  summary,  separate  performance  measures  of  different 
behavioral  dimensions  are  combined  by  various  analytical  and 
empirical  methods  into  a  single  overall  or  composite  score 
with  the  idea  that  when  the  score  is  high,  as  compared  to  a 
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predetermined  criterion  or  MOE,  it  indicates  "good"  or 
"successful"  performance,  and  when  low,  indicates  "poor"  or 
"unsuccessful"  performance  [Cureton,  1951]  .  Difficulties  in 
how  to  combine  the  measures  into  a  single  overall  score  leads 
to  preservation  of  the  behavior  dimensions  and  a  "vector"  of 
measures . 

3 .  Overall  Versus  Diagnostic  Measures 

Overall  measures  of  skilled  performance,  as  previously 
discussed,  along  with  total  system  output  measures  (e.g., 
bomb-drop  accuracy,  number  of  targets  hit,  fuel  consumed)  are 
beneficial  in  assessing  total  system  performance  but  are  ser¬ 
iously  lacking  in  diagnostic  information  of  potential  value 
to  the  trainee  [Buckhout  and  Cotterman,  1963;  Kelley  and 
Wargo,  1968;  Bergman  and  Siegel,  1972].  Overall  scores  tell 
nothing  about  the  operator's  performance  on  various  specific 
tasks  which  are  involved  in  flying  an  aircraft  on  a  mission, 
but  are  highly  useful  for  performance  evaluation. 

Diagnostic  measures  are  the  same  measures  that  result 
from  performance  measurement  before  any  combining  operations 
take  place.  These  measures  identify  certain  aspects  or  ele¬ 
ments  of  a  task  or  performance  in  specific  skill  areas  and 
provide  useful  information  on  strengths  and  weaknesses  in 
individual  skills  [Smode,  et  al.,  1962].  Since  they  are  con¬ 
cerned  with  smaller  and  more  precisely  defined  units  of  be¬ 
havior,  they  are  easier  to  measure  by  objective  methods. 
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It  thus  appears  that  overall  and  diagnostic  measures 
are  contradictory  but  both  essential.  For  the  training  envi¬ 
ronment/  where  a  student  is  learning  skills  necessary  to  per¬ 
form  a  task,  both  measures  are  valuable  for  what  information 
they  provide,  as  discussed  above.  Using  the  two  together  in 
a  complementary  fashion  was  perhaps  best  stated  by  Smode,  et 
al.  [1962] ,  "A  prime  value  of  an  overall  measure  is  the  support 
it  provides  in  evaluation  since  diagnostic  measures  alone  are 
difficult  to  interpret  without  some  terminal  output  measure 
of  performance."  Kelley  and  Wargo  [1968]  recommended  that 
performance  be  measured  in  each  dimension,  evaluated  separ¬ 
ately  by  comparison  to  specific  and  predefined  criteria,  and 
then  combined  into  an  overall  total  score,  so  the  trainee  can 
receive  feedback  relating  to  his  relative  performance  on  the 
various  dimensions  of  his  task,  as  well  as  on  his  overall 
performance.  More  recently,  Vreuls  and  Wooldridge.  [1977] 
described  multivariate  statistical  modeling  techniques  that 
are  powerful  enough  to  provide  measures  for  diagnosis,  and 
yet  also  provide  single  measures  that  could  be  combined  into 
an  overall  score. 

The  qualities  of  overall  and  diagnostic  measures  have 
been  described  and  their  relationship  has  been  discussed. 

Within  the  training  environment,  using  one  without  the  other 
appears  to  cause  a  decrease  in  the  quality  of  information 
available  to  the  individuals  who  most  need  it:  both  student 
and  instructor.  Therefore,  for  the  design  of  a  system  to 
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measure  student  B/N  performance  during  a  radar  navigation 
mission,  it  appears  advantageous  to  employ  the  use  of  both 
overall  and  diagnostic  measures  in  a  mutually  beneficial 
manner  that  will  provide  the  maximum  amount  of  accurate  in¬ 
formation  for  the  purposes  of  training  situations. 

4 .  Individual  Versus  Crew  Performance 

One  of  the  assumptions  of  this  thesis  is  that  the 
variability  of  the  total  contribution  of  the  pilot  in  the 
conduct  and  successful  accomplishment  of  the  radar  navigation 
mission  is  small  enough  to  essentially  be  ignored  when  measur¬ 
ing  the  performance  of  the  B/N  during  the  same  mission.  This 
assumption  was  based  on  the  major  role  played  by  the  B/N 
during  radar  navigation,  the  unique  design  of  the  A-6E  CAINS 
navigation  system,  and  the  radar  navigation  mission  itself. 
Although  it  is  recognized  that  any  successful  accomplishment 
of  a  mission  depends  to  some  degree  on  the  crew  interactions 
and  coordination,  the  actual  measurement  of  the  interaction 
and  coordination  was  beyond  the  current  scope  of  study,  and 
will  be  left  for  future  investigation  and  research. 

The  unique  problem  of  measuring  crew  coordinated  per¬ 
formance  has  been  the  focus  of  much  research,  but  the  question 
of  what  "crew  coordination"  is  remains  unanswered  [Mixon  and 
Moroney,  1981].  Smode,  et  al .  [1962]  provided  a  detailed 
discussion  on  the  problems  and  approaches  taken  in  measuring 
aircrew  coordination,  and  concluded  that  measured  interaction 
and  communication  were  good  for  differentiating  "good"  and 
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"bad"  crews.  Good  crews  reduced  individual  interaction  and 
communication  to  a  minimum  so  that  more  time  was  available  to 
devote  effort  to  performing  the  individual  tasks  associated 
with  accomplishing  the  mission.  This  conclusion  does  not 
acknowledge  the  inherent  difficulties  involved  in  objectively 
measuring  individual  interaction  and  communications.  Until 
further  research  uncovers  objective,  valid,  reliable,  and 
practical  methods  of  measuring  crew  coordination,  this  area 
is  perhaps  a  measurement  function  best  delegated  to  a  human 
observer  (instructor) . 

5.  Measures  and  Training 

The  importance  of  overall  and  diagnostic  measures  in 
the  training  environment  has  been  previously  discussed.  Meas¬ 
ures  for  the  evaluation  of  performance  are  related  in  some 
degree  to  the  stages  of  training.  Early  in  training,  when 
skilled  behavior  is  made  up  largely  of  familiarization  with 
the  task  and  basic  knowledge  of  procedures,  measurement  may 
consist  of  more  familiarization  and  procedure-related  measures. 
Late  in  training,  when  skilled  performance  has  become  more  or 
less  automatic,  measurement  becomes  more  difficult  due  to  the 
highly  cognitive  and  covert  nature  of  the  skilled  behavior 
[Fitts,  1965;  Glaser  and  Klaus,  1966].  In  this  case,  measure¬ 
ment  becomes  more  indirect  than  direct. 

Designing  any  measurement  system  within  the  training 
environment  requires  a  detailed  understanding  of  the  training 
process  and  its  relationship  to  performance  measures,  in 
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addition  to  an  explicit  understanding  of  the  basic  nature 
of  the  skills  involved  in  performing  the  task.  The  latter 
subject  will  be  discussed  in  Chapter  V. 

D.  PERFORMANCE  EVALUATION  CONSIDERATIONS 

Although  the  purpose  of  this  thesis  is  to  design  a  system 
to  improve  current  performance  measurement  techniques  for  the 
FRS  B/N  student,  some  mention  must  be  made  of  performance 
evaluation  since  performance  measurement  exists  as  information 
necessary  to  evaluate  individual  and  system  performance.  With¬ 
out  evaluation,  little  reason  exists  for  the  measurement, 
recording,  and  storage  of  performance  measures.  This  section 
will  briefly  outline  current  evaluation  methods  and  the  char¬ 
acteristics  of  evaluation  itself. 

1.  Definition  and  Purpose 

Being  consistent  with  the  previous  discussions  on 
performance  measurement  and  criteria,  performance  evaluation 
is  simply  the  process  of  identifying  and  defining  performance 
criteria  and  then  comparing  the  criteria  to  performance  measures 
produced  by  performance  measurement.  All  performance  evalua¬ 
tion  requires  some  comparison  between  a  standard  and  an  esti¬ 
mate  of  what  the  standard  truly  represents  [Angell,  et  al . , 

1964;  Demaree  and  Matheny,  1965] .  The  purpose  of  performance 
evaluation  in  the  training  environment  is  usually  multidimen¬ 
sional  in  nature  but  all  evaluation  occurs  for  the  purpose  of 
accurate  decision-making  by  the  instructor  regarding  student 
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performance  and  by  the  training  officer  for  effective  train¬ 
ing  control.  On  the  instructor  level  of  evaluation,  faulty 
decision-making  due  to  any  performance  evaluation  involves 
two  possible  errors:  Type  I  and  Type  II,  as  found  in  Table  V. 


TABLE  V:  PERFORMANCE  EVALUATION  DECISION  MODEL 
FOR  THE  INSTRUCTOR 


REALITY: 


UNSKILLED 

SKILLED 

Student  has  not 

Correct 

decision 

acquired  skill 
to  perform  task 

Type  I  (a) 

DECISION:  "  . 

Student  has 
acquired  skill 
to  perform  task 

Type  II  (8) 

Correct 

decision 

The  tangible  effects  of  a  Type  I  error  are  possible  increased 
costs  due  to  overtraining,  an  inefficient  training  flow  of 
students,  and  a  demotivated  student.  On  the  other  hand,  a 
Type  II  error  may  result  in  increased  costs  due  to  an  aircraft 
accident  and  the  loss  of  human  life.  This  example  illustrates 
the  important  role  that  performance  measurement  and  subsequent 
evaluation  plays  in  providing  accurate  information  necessary 
for  correct  decision-making  by  the  instructor. 
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2 .  Types  of  Performance  Evaluation 


Based  on  the  purpose  of  the  evaluation,  evaluation 
may  be  divided  into  two  general  types;  aptitude  and  achieve¬ 
ment.  According  to  Marks  [1961] ,  if  the  purpose  is  to  predict 
the  capacity  of  a  trainee  to  absorb  training  and  perform  a 
task,  the  evaluation  is  called  an  aptitude  test.  If  the  pur¬ 
pose  is  to  tell  how  well  the  trainee  has  absorbed  training  or 
can  perform  the  task,  the  evaluation  is  called  an  achievement 
measure.  When  considering  achievement  measures,  it  is  possible 
to  distinguish  three  basic  kinds: 

(1)  Proficiency  tests  require  the  individual  to  answer 
questions  about  his  job  or  about  some  content 
knowledge  area  related  to  his  job. 

(2)  Performance  tests  involve  controlled  observation 
of  an  individual  actually  performing  his  job. 

(3)  Rating  methods  use  the  opinion  of  someone  who  has 
actually  seen  the  man's  performance  on  the  job. 

For  details  on  the  characteristics,  advantages,  and  disadvan¬ 
tages  of  each  kind  of  achievement  measure,  the  interested 
reader  is  referred  to  Marks  [1961] . 

A  model  is  anything  that  represents  reality.  Two 
performance  evaluation  models  that  are  utilized  in  achieve¬ 
ment  measures  are  norm— ref erenced  testing  and  criterion- 
referenced  testing. 

a.  Norm-Referenced  Testing 

Norm-referenced  testing  involves  the  use  of  norm- 
referenced  measures  in  evaluating  performance.  Norm- ref erenced 
measures  compare  the  performance  of  an  individual  with  the 
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performance  of  other  individuals  having  similar  backgrounds 
and  experience  [Glaser  and  Klaus,  1966;  Knoop  and  Welde,  197  3; 
Danneskiold,  1955] .  The  stability  of  a  norm-referenced  meas¬ 
ure  is  highly  dependent  upon  sample  size.  Too  small  a  sample 
can  yield  measures  of  central  tendency  and  variability  that 
poorly  approximate  actual  population  values  [Glaser  and  Klaus, 
1966;  Danneskiold,  1955]. 

b.  Criterion- Ref erenced  Testing 
Criterion-referenced  testing  uses  criterion- 

referenced  measures  for  making  an  evaluation  of  performance. 
These  measures  involve  a  comparison  between  system  capabili¬ 
ties  and  individual  performance  [Glaser  and  Klaus,  1966] . 

Such  measures  indicate  whether  an  individual  has  reached  a 
given  performance  standard  [Knoop  and  Welde,  1973] .  The 
standard  for  criterion-referenced  measures  may  be  determined 
either  by  analysis,  subjective  judgements  by  a  panel  of  ex¬ 
perts,  or  numerous  successful  performances  as  sampled  from 
a  large  population  [Knoop  and  Welde,  1973]. 

c.  Criterion-  Versus  Norm- Referenced  Testing 

A  recent  article  by  Swezey  [1973]  reviewed  and 
described  the  relative  advantages  and  disadvantages  of  cri¬ 
terion-referenced  and  norm-referenced  testing,  from  which 
the  conclusions  are  cited  below: 

Content  validated  criterion-referenced  tests,  which  are 
derived  from  appropriate  job,  task,  or  training  analyses, 
often  provide  the  best  available  measure  of  performance; 
particularly  in  objectives-oriented  situations.  It  is 
often  the  case  that  no  better  criterion  exists  upon  which 
to  validate  the  instrument. 
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Other  researchers  have  supported  this  conclusion,  especially 
in  the  field  of  aircrew  training  performance  measurement,  and 
it  is  perhaps  a  more  feasible  alternative  to  the  more  tradi¬ 
tional  and  less  efficient  method  of  norm-referenced  testing 
[Knoop  and  Welde,  1973;  Waag  and  Eddowes,  1975;  McDowell, 
1978;  Uhlaner  and  Drucker,  1980]. 

3 .  Accuracy  of  Evaluation 

The  accuracy  of  evaluation  is  dependent  upon  the 
accuracy  of  measurement,  the  accuracy  and  relevance  of  the 
criteria,  and  the  evaluation  conditions.  Since  the  accuracy 
of  measurement  and  criteria  have  already  been  addressed,  this 
section  will  be  limited  to  evaluation  conditions. 

During  any  evaluation,  several  sources  of  contamina¬ 
tion,  or  bias,  as  discussed  by  Danneskiold  [1955]  and  Glaser 
and  Klaus  [1966] ,  may  affect  the  performance  evaluation  of 
individuals,  and  are  listed  below: 

(1)  In  performance  testing,  one  individual  may  natur¬ 
ally  perform  better  than  another  during  the 
examination  situation,  even  though  both  may 
actually  possess  the  same  skill  level. 

(2)  The  sequence  and  construction  of  the  simulated 
mission  test  may  cause  some  individuals  to  respond 
in  a  way  that  is  dependent  only  on  the  test  se¬ 
quence  and  construction. 

(3)  Judgemental  errors  occur  whenever  individuals  are 
used  to  observe  performance,  due  to  prejudices 
and  st.ereotypes  formed  by  the  observer. 

(4)  Evaluating  condenses  performance  dimensions  into 

a  compact  and  meaningful  unit, where  in  the  process 
some  information  is  lost. 

(5)  Observed  performance  is  only  a  sample  of  the  total 
skills  and  knowledge  of  the  individual. 
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The  accuracy  of  evaluation  may  be  increased  by  improv¬ 
ing  either  measurement  accuracy,  the  accuracy  and  relevance  of 
criteria,  or  the  evaluation  conditions.  For  evaluation  condi¬ 
tions,  the  common  method  of  eliminating  some  of  the  five  bias 
factors  mentioned  above  is  to  increase  objectivity  in  measure¬ 
ment  and  to  standardize  the  test  conditions  [Glaser  and  Klaus, 
1966]  . 

4 .  Evaluating  Individual  and  Group  Differences 

The  measurement  of  differences  in  individual  performance 
is  highly  desirable  in  a  training  situation.  As  training  pro¬ 
gresses,  the  performance  of  individual  trainees  gradually 
approaches  the  desired  minimum  skill  level  required  for  system 
operation.  The  accurate  evaluation  of  what  skill  level  the 
individual  actually  possesses  during  the  training  process  is 
necessary  for  efficient  training  control  and  for  efficient 
instruction  [Glaser  and  Klaus,  1966] .  Several  methods  have 
been  developed  to  identify  which  performance  measures  can  best 
discriminate  among  individuals  at  various  ability  levels;  these 
will  be  discussed  in  Chapter  VII  due  to  their  applicability  in 
designing  the  measurement  system  at  hand  [Parker,  1967; 

Buckhout  and  Cotterman,  1963;  Thorndike,  1951]. 

Measuring  group  differences,  as  opposed  to  individual 
differences,  is  more  suitable  for  the  purposes  of  treatment 
evaluation,  such  as  the  training  method,  length  of  instruction, 
and  design  of  displays  and  controls,  and  will  not  be  addressed 
in  detail  here.  Interested  readers  may  consult  Glaser  and 


102 


Klaus  [1966]  or  Moore  and  Meshier  [1979]  for  further  dis¬ 
cussions  of  measuring  methods  for  group  differences. 

5.  Characteristics  of  Evaluation 


Some  characteristics  and  considerations  that  contrib¬ 
ute  to  improving  the  evaluation  process  are  listed  below: 

(1)  Repeatability  of  a  measure  implies  that  a  specified 
score  achieved  today  represents  the  same  level  of 
performance  as  it  did  at  a  previous  time  (temporal 
invariance)  [McDowell,  1978]. 

(2)  Sensitivity  of  a  measure  occurs  when  a  measure 
reliably  changes  whenever  the  operator's  performance 
changes  [Grodsky,  1967;  Kelley  and  Wargo,  1968; 

Knoop  and  Welde,  1973] . 

(3)  Comprehensiveness  of  the  measures  employed  in  cover¬ 
ing  as  wide  a  range  of  flying  skills  as  possible 
[Ericksen,  1952] . 

(4)  Interpretability  of  measures  and  evaluation  results 
[Demaree  and  Matheny,  1965;  Waag  and  Eddowes,  1975; 
McDowell,  1978]. 

(5)  Immediately  available  measures  and  scores  to  provide 
the  student  with  knowledge  of  results  [Buckhout  and 
Cotterman,  1963;  Demaree  and  Matheny,  1965;  Welford, 
1971;  Waag  and  Eddowes,  1975;  McDowell,  1978;  Kelly, 
1979] . 

(6)  Economical  considerations  require  that  evaluation 
be  constrained  by  cost  and  availability  of  person¬ 
nel,  yet  adequate  at  a  minimum  level  for  the  purpose 
at  hand  [Marks,  1961;  Demaree  and  Matheny,  1965] . 

(7)  Standardization  of  test  conditions  and  environments 
enables  performance  to  more  accurately  reflect  true 
operator  skill  [Ericksen,  1952;  Marks,  1961;  Smode, 
et  al. ,  1962;  Demaree  and  Matheny,  1965] . 

This  list  of  desirable  characteristics  of  evaluation 
is  not  all-inclusive  but  does  provide  some  foundation  for 
examining  existing  evaluation  systems  for  those  properties 
that  are  in  consonance  with  system  and  evaluation  goals. 
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V.  THE  NATURE  OF  THE  BOMBARDIER/NAVIGATOR  TASK 


A.  INTRODUCTION 

Navigating  an  A-6E  TRAM  aircraft  during  a  low  altitude, 
non-visual  air  interdiction  mission  is  perhaps  one  of  the  most 
demanding  and  complex  tasks  expected  of  navigators  today.  The 
aircraft  must  avoid  rough  or  mountainous  terrain  while  travel¬ 
ing  narrow  corridors  between  geographical  turn  points,  which 
must  be  crossed  with  pinpoint  accuracy  at  predesignated  times. 
Literally  hundreds  of  individual  steps  or  procedures  are  in¬ 
volved  in  navigating  the  aircraft,  each  of  which  contribute 
in  some  dimension  to  attaining  the  mission  objective.  Figure 
2,  adapted  from  Obermayer  and  Vreuls  [1974] ,  shows  the  crew- 
system  interactions  in  the  A-6E  aircraft.  The  pilot  controls 
the  aircraft  and  manages  aircraft  flight  systems  while  receiv¬ 
ing  visual  and  auditory  navigational  information  from  the 
Vertical  Display  Indicator  (VDI)  and  B/N,  respectively.  As  it 
can  be  noted  in  Figure  2,  the  B/N  manages  the  navigational 
equipment,  processes  large  amounts  of  concurrent  information, 
and  makes  critical  decisions  regarding  the  navigational  accur¬ 
acy  of  the  aircraft.  At  any  one  time,  the  B/N  may  be  executing 
tasks  that  are  dichotomous,  sequential,  continuous,  monitorial, 
computational,  or  decisional  in  nature.  At  all  times  he  is 
serving  as  the  systems  manager. 
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Figure  2.  A-6E  Crew-System  Network. 


One  of  the  more  difficult  subtasks  is  radar  scope  inter¬ 
pretation.  This  activity  involves  the  recognition  of  the 
relationship  between  specific  symbols  or  patterns  of  symbols 
on  a  flight  chart  with  the  specific  returns  or  patterns  of 
returns  on  the  radar  scope  [Beverly,  1952] .  The  success  of 
this  identification  subtask  depends  largely  upon  the  quantity 
and  quality  of  a  priori  information  about  the  target  that  was 
available  to  the  radar  navigator  [Williams,  et  al.,  I960]. 

This  subtask  may  be  performed  while  the  B/N  is  monitoring  the 
Inertial  Navigation  System  (INS) ,  observing  computer-generated 
navigational  information  displays,  and  informing  the  pilot 
about  current  equipment  status.  It  is  an  axiom  among  student 
B/Ns  that  "if  you  are  sitting  there  perceiving  that  everything 
has  been  done,  you  are  getting  behind." 

This  section  will  define  and  describe  the  nature  of  the 
B/N's  tasks  in  terms  of:  (1)  the  physical  variables  of  the 
aircrew-aircraft  system,  and  (2)  the  complex  skills  and  abil¬ 
ities  of  a  perceptual,  psychomotor,  and  cognitive  nature. 

The  importance  of  operationally  describing  and  systematically 
classifying  the  B/N's  tasks  from  a  behavioral  point  of  view 
is  that  such  an  analysis  may  point  to  areas  where  measurement 
of  performance  is  both  desirable  and  feasible,  and  may  indi¬ 
cate  the  relationships  of  individual  tasks  to  overall  mission 
success  [Smode,  et  al.,  1962;  Vreuls,  et  al.,  1974].  The 
tool  used  to  define  and  describe  the  tasks  of  the  B/N  will  be 
a  task  analysis. 
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B.  TASK  CONSIDERATIONS 

1.  Task  Definition 

A  task  is  one  or  more  activities  performed  by  a  single 
human  operator  to  accomplish  a  specified  objective  [Connelly, 
et  al.,  1974].  In  the  aviation  training  environment,  a  navi¬ 
gation  task  is  the  successful  action  of  the  navigator  in 
response  to  visual,  aural,  vestibular,  and  tactile  information 
concerning  the  actual  and  desired  values  of  a  particular  par¬ 
ameter  (or  more  than  one  parameter)  associated  with  navigating 
the  aircraft,  usually  after  completing  a  lesson  or  series  of 
lessons  [Demaree  and  Matheny,  1965;  Anderson  and  Faust,  1974] . 

2 .  Classification  of  Tasks 

There  are  numerous  task  classification  methods,  all 
of  which  depend  on  the  purpose  of  describing  the  tasks  and  the 
nature  of  the  tasks  themselves.  Smode,  et  al.  [1962]  classi¬ 
fied  behavior  for  performance  measurement  purposes  with  the 
idea  of  accommodating  both  diagnostic  measures  relating  to 
elemental  tasks  as  well  as  the  more  global  measurements  relat¬ 
ing  to  overall  system  performance.  These  general  behavior 
classes  are  listed  and  defined  as  follows: 

a.  Level  I  -  Elemental  Tasks 

The  simplest  level  of  analysis,  referring  to  any 
homogeneous  series  of  work  sequences  conducted  at  one  time, 
or  single  actions  taken  toward  accomplishing  a  desired  objec¬ 
tive.  These  tasks  range  from  short  duration  discrete  homogen¬ 
eous  acts  to  longer  sequences  of  routing  activity. 
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b.  Level  II  -  Complex  Tasks 

The  composite  of  activities  which  involve  identi¬ 
fiable  sequences  of  homogeneous  activity  or  recurring  single 
actions  and  sub-routines  in  performance-  Each  complex  task 
is  made  up  of  tasks  from  Level  I,  involving  either  the  simul¬ 
taneous  and/or  sequential  integration  of  combinations  of  ele¬ 
mental  tasks,  or  the  repetition  of  a  single  Level  I  activity 
over  time. 

c.  Level  III  -  Mission  Segments 

The  segments  or  phases  of  performance  that  are 
identified  in  full  mission  activity.  Essentially,  a  segment 
is  composed  of  a  group  of  complex  tasks  (Level  II)  which  are 
integrated  in  the  performance  at  this  level  of  description. 

d.  Level  IV  -  Overall  Missions 

The  major  types  of  missions  anticipated  for 
advanced  flight  vehicles.  Each  mission  is  composed  of  a 
group  of  segments  of  activity  which  are  integrated  in  the 
performance  at  this  level  of  description. 

Beginning  with  overall  missions  and  ending  with  elemental 
tasks,  these  progressive  refinements  in  task  specificity  allow 
performance  measurement  decisions  to  be  made  at  progressively 
more  detailed  levels. 

3 .  Task  Relation  to  Performance  and  System  Purpose 

For  measurement  purposes,  it  is  neither  practical  nor 
desirable  to  measure  all  possible  task  conditions  which  might 
occur  in  accomplishing  a  mission  objective  [Smode,  et  al., 


103 


1962;  Vreuls,  et  al.,  1974].  To  be  practical,  an  attempt 
should  be  made  to  simplify  the  analysis  of  tasks  and  remove 
irrelevant  measurement  [Vreuls,  et  al. ,  1974].  Since  measure¬ 
ment  is  only  possible  on  the  basis  of  specific,  observable 
events,  a  great  deal  of  investigation  and  analysis  must  be 
accomplished  to  describe  tasks  that  are  representative  of 
accomplishing  the  mission  purpose  while  at  the  same  time  are 
measurable  [Glaser  and  Klaus,  1966].  Glaser  and  Klaus  [1966] 
identified  two  kinds  of  observable  performance  that  are  useful 
for  performance  measurement:  the  behavioral  repertory  of  the 
operator  in  the  form  of  verbal  and  motor  actions,  and  the 
operator's  effects  on  overall  system  performance  or  output. 
From  the  measurement  of  either  of  these  two  observed  perfor¬ 
mances,  some  inference  can  be  made  about  the  operator's  level 
of  skill  in  performing  operational  and  describable  tasks. 

The  intimate  relationship  between  the  task  of  the 
operator  and  performance  measurement  of  the  operator  for  the 
purpose  of  estimating  his  level  of  skill  was  best  stated  by 
Smode,  et  al.  [1962] : 

The  behaviors  and  tasks  which  are  observed  and  measured 
necessarily  will  be  a  sampling  from  those  which  comprise 
the  complete  system  activity,  for  it  is  neither  feasible 
nor  necessary  to  measure  everything  in  order  to  evaluate 
proficiency.  What  one  evaluates  depends  on  purpose. 
Determining  those  properties  of  behavior  significant  to 
the  purpose  aids  in  defining  the  areas  of  human  behavior 
for  assessment.  In  the  interest  of  maximizing  validity 
of  measurement,  this  sampling  should  be  guided  by  the 
criticality  of  the  tasks  and  operational  segments  to 
mission  or  system  success.  As  a  rule,  those  tasks  should 
be  selected  for  measurement  on  which  good  performance 
results  in  mission  success  and  on  which  poor  performance 
means  failure. 


109 


As  discussed  previously,  identifying  the  purpose  of 
the  system  is  important  for  defining  performance  standards 
and  for  accurate  measurement  of  behavior.  Likewise,  the  pur¬ 
pose  of  the  system  helps  define  those  tasks  which  should  be 
measured,  and  is  essential  for  discovering  what  the  relation¬ 
ship  is  between  mission  tasks  performed  by  the  operator  and 
the  probability  of  mission  success  [Smode,  et  al.,  1962; 
Buckhout  and  Cotterman,  1963;  Cotterman  and  Wood,  1967]. 
Actions  that  are  critical  to  performance  in  that  they  differ¬ 
entiate  between  success  and  failure  in  performance  can  only 
be  identified  properly  in  terms  of  the  ultimate  purpose  or 
goal  of  the  man-machine  system  [Smode,  et  al.,  1962]. 

C.  RADAR  NAVIGATION  TASK  ANALYSIS 
1 .  Background 

A  task  analysis  is  a  time-oriented  description  of 
man-machine  interactions  brought  about  by  an  operator  in  ac¬ 
complishing  a  unit  of  work  with  an  item  of  the  machine,  and 
shows  the  sequential  and  simultaneous  manual  and  intellectual 
activities  of  the  man  operating,  maintaining,  or  controlling 
equipment,  rather  than  a  sequential  operation  of  the  equipment 
[Department  of  Defense,  MIL-H-46855B,  1979].  Miller  [1953] 
presented  a  more  usable  definition  of  a  task  analysis:  the 
gathering  and  organization  of  the  psychological  aspects  of  the 
indication  to  be  observed  (stimulus  and  channel) ,  the  action 
required  (response  behavior,  including  decision-making) ,  the 
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skills  and  knowledge  required  for  task  performance,  and 
probable  characteristic  human  errors  and  equipment  malfunc¬ 
tions  . 

A  task  analysis  is  conducted  mainly  for  the  design  of 
new  systems  or  for  improvements  to  existing  systems  and  pro¬ 
vides  basic  building  blocks  for  the  rest  of  human  engineering 
analysis  [Van  Cott  and  Kinkade,  1972] .  The  purpose  of  the 
task  analysis  presented  here  is  to  improve  current  performance 
measurement  of  the  B/N  in  the  A-6E  WST,  and  will  be  discussed 
more  fully  in  respect  to  this  purpose  in  Chapter  VII. 

There  are  several  methods  of  conducting  a  task  analysis 
which  are  classified  as  either  empirical,  analytical,  or  some 
combination  of  both.  The  empirical  methods  rely  on  industrial 
engineering  techniques  such  as  time  and  motion  study  [Mundel, 
1978]  while  the  analytical  techniques  involve  the  use  of  expert 
opinions  through  interviews  or  questionnaires.  Van  Cott  and 
Kinkade  [1972]  advocated  seeking  information  from  a  wide 
variety  of  sources  and  employing  more  than  one  technique  in 
order  to  adequately  describe  what  an  operator  actually  does 
in  a  system. 

"A  completely  developed  task  analysis  will  present  a 
detailed  description  of  the  component  behavioral  skills  that 
the  accomplishment  of  the  task  entails,  the  relationships 
among  those  components,  and  the  function  of  each  component  in 
the  total  task  [Anderson  and  Faust,  197  4]  ."  Since  a  task 
analysis  involves  breaking  down  a  task  into  behavioral 
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components  for  the  purpose  of  performance  measurement,  the 
question  of  when  to  stop  subdividing  the  task  is  most  impor¬ 
tant.  Anderson  and  Faust  [1974]  proposed  that  enough  task 
analysis  detail  is  reached  when  the  intact  or  component  skill 
is  part  of  the  student's  entering  behavior. 

The  use  of  a  task  analysis  for  the  purpose  of  perfor¬ 
mance  measurement  assumes  that  behavior  can  be  analyzed  in 
terms  of  basic  components  that  are  conceptually  identified 
in  a  way  that  is  convenient  and  agreeable  to  people  and  that 
specific  measurement  techniques  appropriate  for  the  various 
behavioral  components  exist  [Smode,  et  al.,  1962].  This 
assumption  becomes  less  theory  and  more  factual  in  light  of 
research  conducted  in  the  helicopter  community.  Locke,  et  al. 
[1965] ,  in  a  study  of  over  500  primary  helicopter  students 
using  the  OH-23D  helicopter,  reported  that  nearly  all  complex 
man-machine  maneuvers  can  be  broken  down  into  independent  com¬ 
ponent  parts  with  associated  component  abilities.  A  more 
recent  study  by  Rankin  and  McDaniel  [1980]  assessed  helicopter 
flight  task  proficiency  using  a  Computer  Aided  Training  Evalu¬ 
ation  and  Scheduling  (CATES)  system,  where  flight  maneuvers 
were  divided  into  tasks  that  were  used  for  performance  meas¬ 
urement  and  evaluation,  and  were  then  utilized  to  determine 
overall  aviator  proficiency. 

Just  as  no  two  task  analyses  are  ever  the  same,  there 
may  be  multiple  sets  of  operator  behavior  possible  to  accom¬ 
plish  the  tasks  as  described  in  one  task  analysis  [Fleishman, 
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1967;  Vreuls  and  Wooldridge,  1977].  This  major  limitation 
of  using  a  task  analysis  to  measure  performance  of  an  oper¬ 
ator  reaffirms  the  idea  of  measuring  all  observable  system 
outputs  and  establishing  the  relationship  among  operator 
actions,  system  outputs,  and  mission  success  or  failure.  By 
empirically  validating  a  task  analysis  in  the  operational  en¬ 
vironment  and  establishing  the  above  mentioned  relationships, 
any  limitations  imposed  by  differences  in  operator  strategy 
on  the  measurement  system  may  be  circumvented. 

2 .  Previous  A-6  Task  Analyses 

a.  Naval  Flight  Officer  Function  Analysis 

In  1972,  the  Chief  of  Naval  Operations  requested 
the  Naval  Aerospace  Medical  Research  Laboratory  (NAMRL)  to 
conduct  a  series  of  investigations  analyzing  the  operational 
functions  of  the  Naval  Flight  Officer  (NFO)  for  the  purposes 
of  revising  NFO  training  programs  and  to  aid  in  determining 
future  training  equipment  requirements  and  characteristics. 
Addressing  NFOs  of  P-3B/C,  RA-5C,  A-6A,  EA-6B ,  E-2C,  and  F-4B/J 
aircraft,  the  investigations  determined  the  roles,  duties  and 
tasks  performed  by  the  NFO  in  a  given  aircraft,  the  percent  of 
NFOs  performing  a  given  task/duty,  the  time  and  effort  spent 
on  various  roles,  duties  and  tasks,  and  finally,  the  task 
criticality. 

The  study  of  interest  for  the  purposes  of  this 
thesis  involves  the  analysis  of  the  B/N  operational  functions 
in  the  A-6A  [Doll,  et  al.,  1972].  The  procedure  used  for  the 
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function  analysis  was  based  on  a  method  of  job  analysis 
developed  by  the  USAF  Personnel  Research  Laboratory  at  Lack- 
land  Air  Force  Base,  Texas.  The  principal  method  of  analyzing 
functions  was  an  inventory  of  activities  approach  that  com¬ 
bined  features  of  the  checklist,  open-ended  questionnaire, 
and  interview  methods. 

The  results  of  analyzing  the  A-6A  B/N  tasks  was 
based  on  84  surveys  completed  by  operational  B/Ns.  Of  six 
major  operational  roles  identified  (communication,  navigation, 
tactics,  sensors,  armament,  and  system  data  processing),  more 
time  and  effort  was  spent  (28  percent)  in  flight  by  the  B/N 
performing  the  navigation  role  than  any  other  single  role. 
Within  the  navigation  role,  five  duties  were  identified:  (1) 
navigate  using  Inertial  Doppler  systems,  (2)  using  TACAN,  (3) 
using  ADF/UHF-ADF ,  (4)  using  visual  ref erences/Dead  Reckoning, 

and  (5)  using  radar.  Over  98  tasks  within  those  five  duties 
were  listed.  Amount  of  time  and  effort  as  well  as  the  criti¬ 
cality  of  each  task  was  recorded  and  a  rank  order  listing  of 
all  tasks  for  these  two  categories  was  presented. 

In  developing  a  task  analysis  for  the  B/N  during 
radar  navigation  (presented  later  in  this  section) ,  this  A-6A 
function  analysis  for  the  B/N  shows  the  importance  of  naviga¬ 
tion  in  terms  of  time  and  effort  spent  by  B/Ns  in  the  opera¬ 
tional  environment.  A  measurement  system  that  accurately 
describes  B/N  performance  during  radar  navigation  would  be 
extremely  useful  from  this  standpoint.  The  time  and  effort. 
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and  criticality  rankings  of  this  source  were  also  useful  for 
those  tasks  that  corresponded  with  the  same  task  or  subtask 
in  the  current  effort,  and  in  developing  a  performance  meas¬ 
urement  system  that  encompassed  critical  tasks  in  terms  of 
their  contribution  to  overall  mission  success. 

b.  Grumman  A-6E  TRAM  Training  Program 

Grumman  Aerospace  Corporation  completed  a  study 
on  the  application  of  ISD  methodology  to  the  design  of  a  train¬ 
ing  program  for  A-6E  TRAM  FRS  pilots  and  B/Ns  in  mid-1976. 
Comprised  of  over  seven  volumes,  the  study  included  a  task 
analysis,  development  of  SBOs,  media  analysis,  and  formulation 
of  lesson  specifications  [Campbell,  1975;  Campbell  and  Sohl, 
1975;  Campbell,  et  al.,  1975;  Hanish  and  Feddern,  1975;  Graham, 
et  al.,  1975;  Campbell,  et  al.,  1977].  The  task  analysis  phase 
of  the  ISD  process  was  performed  jointly  by  a  team  consisting 
of  Navy  Subject  Matter  Experts  (SMEs)  and  Grumman  training 
psychologists,  educational  specialists,  and  flight  test  per¬ 
sonnel.  Tasks  were  to  be  identified  based  on  performance  in 
the  operational  environment  and  described  in  sufficient  depth 
to  permit  an  identification  of  the  underlying  skills  and 
knowledge  required  by  the  crewmen  to  perform  the  task.  A 
hierarchical  approach  for  describing  the  pilot  and  B/N  behav¬ 
iors  during  a  mission  resulted  in  three  levels  of  description: 
major  mission  events,  the  tasks  which  comprise  the  events,  and 
the  steps  which  describe  the  incremental  actions  an  aircrewman 
must  take  to  complete  a  task. 
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The  first  result  of  the  task  analysis  effort  was 
a  comprehensive  task  listing  comprised  of  over  400  nominal 
pilot  tasks,  each  with  an  average  of  approximately  10  steps; 

70  airframe  emergency  sequences  involving  an  average  of  7-10 
steps  each,  35  system  malfunctions,  and  more  than  200  nominal 
B/N  tasks  with  an  average  of  10  steps  each.  The  listings 
represented  tasks  for  which  training  needed  to  be  conducted 
at  the  FRS  level.  A  Task  Analysis  Record  (TAR)  form  was  util¬ 
ized  for  each  task  to  ascertain  the  following:  (1)  crewman 
performing  task,  (2)  where  training  was  given,  (3)  skills  and 
knowledge  required  by  task,  (4)  conditions  under  which  task 
is  performed,  (5)  cues  involved  in  performance,  (6)  aircraft 
system  involved,  (7)  degree  of  difficulty,  (8)  factors  in  task 
difficulty,  (9)  task  criticality,  (10)  factors  in  performance 
measurement,  and  (11)  other  special  factors  which  impacted  on 
training.  Because  the  TAR  was  used  for  the  purpose  of  in¬ 
structional  sequencing  and  blocking  downstream  in  the  ISD 
process  and  as  an  aid  in  selecting  appropriate  instructional 
strategies,  it  was  not  published  as  part  of  the  study. 

The  actual  task  analysis  appears  in  the  form  of 
an  ISD  record  developed  from  the  TAR  and  SBOs.  Objectives 
were  classified  on  the  basis  of  eight  major  taxonomic  cate¬ 
gories:  (1)  knowledge,  (2)  comprehension,  (3)  discrimination, 
(4)  application,  (5)  analysis,  (6)  synthesis,  (7)  evaluation, 
and  (8)  complex  performance.  This  taxonomy  was  retained  for 
the  current  thesis  task  analysis  effort  and  will  be  defined 
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in  Table  VII.  The  ISD  record  then  contained  the  SBO,  task 
identification  data,  condition/constraints,  performance  stand¬ 
ard,  taxonomic  data,  a  criterion  test  statement,  and  test  type 
and  format. 

The  Grumman  task  analysis  effort  becomes  useful 
to  the  current  effort  of  developing  a  task  analysis  for  the 
B/N  during  the  radar  navigation  maneuver,  and  using  that  task 
analysis  for  the  purpose  of  performance  measurement.  In  this 
respect,  the  Grumman  study  was  used  as  a  guiding  outline  in 
developing  the  current  task  analysis. 

Prophet  [1978]  reviewed  past  ISD  efforts  in  Navy 
fleet  aviation  training  program  development  that  included  the 
Grumman  A-6E  ISD  program,  and  made  the  following  comments  in 
reference  to  measurement  and  evaluation  for  that  program: 

(1)  Methodologies  being  followed  did  not  necessarily 
require  a  systematic  treatment  of  measurement  and 
evaluation. 

(2)  No  discussion  of  the  mechanics  of  measurement  for 
standards  found  in  SBOs  is  given. 

(3)  While  a  clear  recognition  of  when  and  where  meas¬ 
urement  will  take  place  is  addressed,  no  information 
is  given  concerning  how. 

(4)  The  problems  of  flight  versus  non-flight  measurement 
were  not  discussed. 

Although  some  criticism  may  be  found  in  the  lack  of  measure¬ 
ment  mechanics  from  the  Grumman  task  analysis  effort,  it  is 
not  a  surprising  revelation  given  the  purpose  of  their  task 
analysis.  Their  effort  is  still  deserving  in  the  light  of  the 
complexity  of  the  aircrew  and  A-6E  combination,  and  this  author 
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used  their  unclassified  task  analysis  material  in  defining 
and  describing  exactly  what  a  B/N  does  during  a  radar  navi¬ 
gation  maneuver. 

c.  Perceptronics  Incorporated  Decision  Task  Analysis 
In  early  1980,  a  study  designed  to  identify  sig¬ 
nificant  aircrew  decisions  in  Navy  attack  aircraft  was  per¬ 
formed  by  Perceptronics,  Inc.  for  the  Naval  Weapons  Center, 
China  Lake,  California.  The  study  selected  two  mission  scen¬ 
arios  that  were  representative  of  A-6E  and  A-7E  aircraft: 
close  air  support  and  fixed  target  attack  [Saleh,  et  al.,  1980]. 
A  mission  analysis  followed  by  an  Aircrew/Avionics  Functions 
Analysis  was  performed  on  each  scenario.  Finally,  a  decision 
identification  analysis  was  performed  which  resulted  in  a 
listing  of  significant  decisions  in  each  mission  for  each  air¬ 
crew.  The  study  results  provided  information  on  decision  type, 
difficulty,  and  criticality. 

Limited  use  was  made  of  this  decision  identifica¬ 
tion  analysis  due  to  the  scenarios  developed  and  the  purpose 
of  the  task  analysis:  decision-making,  and  some  dependence  of 
that  task  analysis  upon  the  previous  efforts  by  Grumman. 
Nevertheless,  a  few  decisional  tasks  were  reviewed  for  use  in 
the  current  effort. 

3 •  Current  Task  Analysis  for  Performance  Measurement 

Using  the  research  provided  by  the  Naval  Flight  Officer 
function  analysis,  the  Grumman  A-6E  TRAM  training  program  task 
analysis,  and  the  Perceptronics,  Inc.  decision  task  analysis. 
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as  discussed  previously,  a  task  analysis  was  performed  with 
the  purpose  of  measuring  B/N  performance  during  radar  navi¬ 
gation  in  the  A-6E  WST.  The  results  of  that  effort,  in  the 
form  of  a  task  listing  (Appendix  A) ,  a  task  analysis  (Appendix 
C) ,  and  a  Mission  Time  Line  Analysis  (MTLA;  Appendix  D) ,  are 
each  presented  separately  below, 
a.  Task  Listing 

As  shown  in  Appendix  A,  the  radar  navigation  man¬ 
euver  was  divided  into  three  segments:  (1)  after  takeoff  checks, 
(2)  navigation  to  the  initial  point  (IP) ,  and  (3)  navigation 
to  the  turn  point  (TP) .  The  navigation  to  TP  segment  (3)  was 
the  portion  of  the  A-6E  CAINS  flight  within  the  scope  of  this 
thesis,  and  was  the  segment  of  interest  to  be  later  expanded 
upon  in  the  form  of  a  task  analysis  and  MTLA  that  will  be  dis¬ 
cussed  later  in  this  section. 

The  following  definitions  will  explain  the  signif¬ 
icance  of  the  symbology  within  the  navigation  to  TP  segment  of 
Appendix  C: 

(1)  Tn  -  Task  number,  where  the  number  is  repre¬ 

sented  by  "n." 

(2)  Sn  -  Subtask  number. 

(3)  (a)  -  Subtask  element,  where  the  element  is 

represented  by  the  lower  case  letter  "a," 
or  other  letters. 

The  nomenclature  for  actual  switches,  keys,  controls,  or 
buttons  in  the  A-6E  CAINS  cockpit  is  underlined  throughout 
the  task  listing  (e.g.,  Rcvr  control).  Discrete  or  continuous 
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settings  for  each  switch,  key,  control,  or  button  is  to  the 
far  right  of  the  task,  subtask,  or  subtask  element,  and  is 
separated  by  a  line  of  periods. 

The  choice  of  language  in  the  form  of  action  verbs 
for  which  behaviors  are  described  was  a  difficult  process, 
due  to  the  lack  of  standardization  in  both  the  science  of 
analyzing  tasks  and  in  aircrew  performance  measurement  re¬ 
search.  The  necessity  of  employing  action  verbs  that  described 
simple  and  easily  observable  activities  and  were  easily  identi¬ 
fied  in  terms  of  performance  measurement  was  paramount  to  the 
current  effort.  A  hybrid  taxonomy,  using  31  action  verbs  as 
shown  in  Table  VI,  was  developed  from  earlier  work  by  Angell, 
et  al.  [1964]  that  was,  in  a  sense,  later  validated  by  Chris¬ 
tensen  and  Mills  [1967]  in  an  analysis  of  locating  represent¬ 
ative  data  on  human  activities  in  complex  operational  systems. 
Using  a  later  study  by  Oiler  [1968]  in  the  form  of  a  human 
factors  data  thesaurus  as  applied  to  task  data,  the  original 
50  action  verbs  used  by  Angell,  et  al.  [1964]  was  reduced  to 
a  total  of  31  action  verbs  by  eliminating  redundant  synonyms 
and  by  using  the  recommended  acceptable  action  verbs  and  nouns. 
Except  for  the  reduction  of  action  verbs  (specific  behaviors) , 
the  remainder  of  the  original  taxonomy  was  preserved.  For  the 
convenience  of  the  reader,  the  31  action  verbs  or  specific 
behaviors  utilized  in  the  current  task  analysis  are  presented 
as  a  glossary  (Appendix  B) . 
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TABLE  VI:  CLASSIFICATION  OF  BOMBARDIER/ 
NAVIGATOR  BEHAVIORS 


PROCESSES 

ACTIVITIES 

SPECIFIC  BEHAVIORS  1 

Perceptual 

Searching  for  information 
Receiving  information 
Identifying  objects, 
actions,  or  events 

Checks 

Monitors 

Observes 

Reads 

Information  processing 

Initiates 

Records 

Uses 

Mediational 

Problem  solving  and 
decision-making 

Checkouts 

Compares 

Continues 

Delays 

Determines 

Evaluates 

Performs 

Repeats 

Selects 

Troubleshoots 

Communication 

Communicating 

Alerts 

Informs 

Instructs 

Motor 

Simple/discrete 

Activates 

Depresses 

Places 

Pushes 

Throws 

Sets 

Complex/continuous 

Adjusts 

Inserts 

Positions 

Rotates 

Tunes 

Source:  Angell,  et  al.  [1964]  (adapted) 
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b.  Task  Analysis 

A  task  analysis  for  the  specific  purpose  of  meas¬ 
uring  B/N  performance  during  radar  navigation  was  performed 
and  is  presented  as  Appendix  C.  As  previously  discussed,  only 
segment  three,  navigation  to  TP,  was  examined  during  the  task 
analysis  to  limit  the  scope  of  this  study.  Since  the  concepts 
of  segment  and  tasks  have  already  been  addressed,  the  seven 
columns  of  the  A-6E  TRAM  radar  navigation  task  analysis  form 
in  Appendix  C  will  now  be  explained  in  detail,  using  guidance 
provided  by  Van  Cott  and  Kinkade  [1972] ,  Anderson  and  Faust 
[1974] ,  Pickrel  and  McDonald  [1964] ,  Smode,  et  al.  [1962] ,  and 
Rosenmayer  and  Asiala  [1976]  : 

(1)  Subtask  -  a  component  activity  of  a  task.  Within  a 

task,  collectively  all  subtasks  comprise  the 
task.  Subtasks  are  represented  by  the  letter 
"S"  followed  immediately  by  a  numeral.  Sub¬ 
task  elements  are  represented  by  a  small  letter 
in  parentheses. 

(2)  Feedback  -  the  indication  of  adequacy  of  response  or 

action.  Listed  as  VISUAL,  TACTILE,  AUDITORY, 
or  VESTIBULAR  and  located  in  the  subtask  column 
for  convenience  only. 

(3)  Action  Stimulus  -  the  event  or  cue  that  instigates 

performance  of  the  subtask.  This  stimulus  may 
be  an  out-of-tolerance  display  indication,  a 
requirement  of  periodic  inspection,  a  command, 
a  failure,  etc. 

(4)  Time  -  the  estimated  time  in  seconds  to  perform  the 

subtask  or  task  element  calculated  from  initi¬ 
ation  to  completion. 

(5)  Criticality  -  the  relationship  between  mission 

success  and  the  below-minimum  performance  or 
required  excessive  performance  time  of  a  par¬ 
ticular  subtask  or  subtask  element.  "High" 

(H)  indicates  poor  subtask  performance  may 
lead  to  mission  failure  or  an  accident. 
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"Medium"  (14)  indicates  the  possibility  of 
degraded  mission  capability.  "Low"  (L)  in¬ 
dicates  that  poor  performance  may  have  little 
effect  on  mission  success. 

(6)  Potential  Error  -  errors  are  classified  as  either 

failure  to  perform  the  task  (OMIT) ,  performing 
the  task  inappropriately  in  time  or  accuracy 
(COMMIT) ,  or  performing  sequential  task  steps 
in  the  incorrect  order  (SEQUENTIAL) . 

(7)  Skills  Required  -  the  taxonomy  of  training  objectives 

used  for  the  Grumman  task  analysis  was  retained 
and  is  presented  in  Table  VII  [Campbell,  et  al., 
1977] .  This  concept  will  be  discussed  in  more 
detail  later  in  this  section. 

(8)  Performance  Measure  Metrics  -  a  candidate  metric  which 

may  best  describe  the  successful  performance  of 
the  task  or  a  genuine  display  of  the  required 
skills.  The  types  of  metrics  suggested  were 
classified  as  TIME  (time  in  seconds  from  start 
to  finish  of  task) ,  T-S  (time-sharing  or  pro¬ 
portion  of  time  that  particular  task  is  performed 
in  relation  to  other  tasks  being  performed  in 
the  same  time  period) ,  R-T  (reaction  time  in 
seconds  from  the  onset  of  an  action  stimulus 
to  task  initiation) ,  ACC  (accuracy  of  task  per¬ 
formance)  ,  FREQ  (number  of  task  occurrences) , 

DEC  (decisions  made  as  a  correct  or  incorrect 
choice  depending  on  the  particular  situation 
and  mission  requirements] ,  QUAL  (quality  of  a 
task,  especially  in  regards  to  radar  scope 
tuning  quality) ,  and  SUBJ  (subjective  observa¬ 
tion  or  comprehension  of  task  execution  success 
by  an  instructor) . 

Due  to  the  lack  of  operational  data,  the  task 
analysis  was  derived  analytically  with  close  attention  being 
paid  to  consistency  with  previous  A- 6  task  analysis  efforts. 

The  validation  of  any  task  analysis  can  only  occur  when  it  is 
subjected  to  the  operational  environment  for  repeated  empirical 
analysis.  Unfortunately,  time,  cost  and  system  availability 
constraints  precluded  the  execution  of  this  important  phase. 
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TABLE  VII:  TAXONOMY  OF  SKILLS  REQUIRED 


Knowledge 

Technological  -  Learning  "how  to"  perform  a  single 
switch  and  control  configuring  procedure.  Learning 
"how  to"  read  meters,  digital  displays,  scopes, 
lighting  displays,  etc.  In  general,  learning  "howto." 

Formal  -  Learning  the  meaning  of  special  symbols, 
acronyms,  words,  nomenclature,  etc. 

Descriptive  -  To  describe  "what  is"  and  "what  was": 
facts,  data,  special  information  about  systems,  sub¬ 
systems,  equipment,  weapons,  tactics,  missions,  etc. 

Concepts  and  Principles  -  Fundamental  truths,  ideas, 
opinions  and  thoughts  formed  from  generalizations 
of  particulars. 

Comprehension 

Understanding  the  meaning  of  meter  readings,  scope,  digital 
and  lighting  displays.  Understanding  the  switch  and  control 
configuring  procedure,  i.e.,  the  reason  for  a  specified 
sequence,  the  reason  for  a  switch  or  control  position,  the 
reason  for  a  verification,  etc. 

Grasping  the  meaning  of  concepts  and  principles,  i.e., 
understanding  the  basic  principles  of  infrared  and  radar 
detection. 

Understanding  the  meaning  of  facts,  data,  specific  informa¬ 
tion,  etc. 

Discrimination 

Distinguishing  among  different  external  stimuli  and  making 
appropriate  responses  to  them,  e.g. ,  scanning  gages  for 
out-of-tolerance  trends.  Also  includes  the  recognition  of 
the  essential  similarity  among  a  class  of  objects  or  events, 
e.g.,  classifying  aircraft  types  or  radar 'return  images. 


Source:  Campbell,  et  al.  [1977]. 
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TABLE  VII  (Continued) 


Application 

Simple  Procedure  -  A  demonstration  of  a  simple  learned 
procedure  in  the  cockpit  or  simulator  requiring  not 
more  than  simply  repeating  required  switch  and  control 
configuring  and  simple  visual  verification  (i.e., 
advisory  light  status) . 

Complex  Procedure  -  A  demonstration  of  a  learned 
procedure  in  a  cockpit  or  simulator  that  requires 
differentiating  or  distinguishing  between  readings 
on  meters,  digital  displays,  and  images  on  video  and 
radar  displays  and  interpreting  and  applying  the 
meaning  of  the  readings  and  images. 

General  -  Using  learned  materials  in  new  and  concrete 
situations  (e.g. ,  using  rules,  methods,  concepts, 
principles,  procedures,  etc.). 

Analysis 

A  demonstration  of  a  learned  process  of  breaking  down 
material  (i.e.,  data,  other  information)  into  its  compon¬ 
ents  so  that  it  may  be  evaluated  with  respect  to  crew's 
safety,  mission  success,  A/C  maintenance,  etc. 

Synthesis 

A  demonstration  of  learned  process,  i.e.,  putting  tactical 
elements  together  (e.g.,  weapons,  targets,  available 
systems,  A/C  capability,  etc.)  to  formulate  a  mission. 

Evaluation 

A  demonstration  of  a  learned  process  of  assessing  or  judging 
a  system  or  situation,  based  on  criteria  (i.e.,  data,  rules, 
available  equipment,  conditions,  etc.)  and  then  reaching  a 
conclusion  based  on  this  assessment. 

Complex  Performance 


A  demonstration  that  requires  psychomotor  skills  and/or 
critical  thinking  skills  usually  requiring  practice. 
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As  it  stands,  a  reasonable  assumption  of  the  existence  of 
some  face  validity  in  the  current  task  analysis  can  be  made 
in  the  light  of  the  author's  operational  experience  as  a  B/N 
in  the  A-6E  CAINS  aircraft  (over  600  hours)  and  the  dependence 
of  the  task  analysis  upon  previous  task  analysis  efforts,  even 
though  none  of  the  previous  efforts  were  formally  validated 
by  empirical  methods.  The  current  task  analysis  was  also 
informally  reviewed  by  other  A-6E  B/Ns  before  finalization  of 
the  effort. 

The  purpose  of  the  task  analysis  was  to  improve 
current  performance  measurement  of  the  3/N  during  radar  navi¬ 
gation  in  the  WST  by  providing  performance  measure  metrics 
(right-hand  column  of  Appendix  C)  that  are  possible  candidates 
for  describing  successful  task  performance  or  B/N  skill  acqui¬ 
sition.  Several  hundred  metrics  are  available  from  which  a 
candidate  set  can  be  chosen  based  on  the  initial  measure  selec¬ 
tion  criteria  as  previously  discussed  in  Chapter  IV.  From  the 
"performance  measure  metrics"  column  of  Appendix  C,  several 
potential  candidate  measures  were  identified  and  will  be  com¬ 
bined  with  potential  measures  from  Table  II  in  Chapter  IV  and 
presented  as  part  of  the  final  candidate  measure  set  listed 
in  Table  XI  (Chapter  VII). 

c.  Mission  Time  Line  Analysis 

An  MTLA  is  a  graphical  analysis  which  relates  the 
sequence  of  tasks  to  be  performed  by  the  operator  to  a  real 
time  basis  [Matheny,  et  al.,  1970].  The  purpose  of  an  MTLA 
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as  used  in  the  current  study  is  to  identify  those  performance 
measurement  points  within  a  man-machine  system  where  standards 
of  accuracy  and  time  may  be  applied  in  the  evaluation  process. 
Essentially  a  bar  chart,  an  MTLA  for  the  navigation- to-TP  seg¬ 
ment  of  the  radar  navigation  maneuver  is  presented  as  Appendix 
D.  The  time  of  execution  for  each  subtask  was  extracted  from 
estimated  completion  times  on  the  task  analysis  record  form 
(Appendix  C) .  Time  was  estimated  with  the  assumption  that 
sensing  conditions  were  good  and  the  B/N  was  highly  skilled. 
Darkly  shaded  time  lines  represent  tasks  that  demand  full  mental 
attention  whereas  shaded  time  bars  represent  "monitoring"  tasks 
or  "troubleshooting"  tasks  that  may  not  have  to  be  executed. 

The  MTLA  is  a  performance  measurement  source  for 
both  the  identification  of  critical  subtasks  and  the  use  of 
time  to  perform  as  a  measure  of  skilled  behavior.  Thus,  the 
MTLA  was  utilized  to  identify  candidate  performance  measures 
as  found  in  the  "performance  measure  metric"  column  of  the 
task  analysis  record  form  (Appendix  C)  that  were  later  used 
for  the  final  candidate  measure  set  as  will  be  described  in 
Chapter  VII  (Table  XI). 

4 .  B/H  Skills  and  Knowledge 

This  section  will  relate  current  skill  acquisition 
principles  to  performance  measurement,  and  present  a  skill 
acquisition  model  of  the  relationship  between  skill  acquisi¬ 
tion,  the  task,  performance  measurement,  and  performance 
evaluation.  A  great  deal  of  discussion  about  the  concept  of 
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skill,  how  skill  is  attained,  and  how  skill  acquisition  is 
measured  can  be  found  in  the  literature  from  such  diverse 
areas  as  private  industry,  control  theory,  information  pro¬ 
cessing,  and  education  [Bilodeau,  1966  and  1969;  Jones,  1970; 
Welford,  1971;  Hulin  and  Alvares,  1971  and  1971;  Singleton, 
1971;  Leshowitz,  et  al . ,  1974;  Shipley,  1976;  Welford,  1976]. 
Despite  the  global  interest  in  skill,  this  discussion  will  be 
limited  to  aircrew  skill  acquisition  and  the  measurement  of 
that  skill. 

a.  Definition  of  Skill 

Skill  may  be  defined  as  the  ability  to  perform 
given  tasks  successfully  or  competently  in  relation  to  speci¬ 
fied  standards  [Cureton,  1951;  Senders,  1974;  Smit,  1976; 
Prophet,  1978] .  A  more  precise  definition  of  skill  is  offered 
by  Connelly,  et  al.  [1974]  : 

The  ability  to  use  knowledge  to  perform  manual  operations 
in  the  achievement  of  a  specific  task  objective  in  a 
manner  which  provides  for  the  elimination  of  irrelevant 
action  and  erroneous  response.  This  conceptualization 
exists  only  in  conjunction  with  an  individual  task  and 
is  reflected  in  the  quality  with  which  this  task  is 
performed . 

Most  definitions  of  skill  rely  on  the  fundamental  concept  that 
the  use  of  capacities  efficiently  and  effectively  as  the  result 
of  experience  and  practice  would  generally  characterize  skill 
[Welford,  1976].  Indeed,  the  concept  of  skill  cannot  be  well 
defined  due  to  the  diversity  of  its  nature  and  remains  more 
or  less  an  amorphous  quantity,  best  described  by  its  charac¬ 
teristics  [Singleton,  1971] : 
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1.  It  is  continuous,  there  is  always  an  extensive  overlap 
and  interaction.  Even  in  principle,  it  cannot  be 
analyzed  by  separation  into  discrete  units  along  either 
space  or  time  axes. 

2.  It  involves  all  the  stages  of  information  processing 
identifiable  in  the  organism,  basically  inputs,  pro¬ 
cessing  and  outputs. 

3.  It  is  learned  and  therefore  highly  variable  within  and 
between  individuals. 

4.  There  is  a  purpose,  objective,  or  goal  providing  mean¬ 
ing  to  the  activity. 

b.  Skill  Acquisition 

The  development  of  skill,  as  previously  discussed, 
is  due  mainly  to  the  effects  of  practice  and  experience  on  the 
use  of  basic  capacities.  Therefore,  the  acquisition  of  skill 
appears  to  result  from  learning  and  seems  to  improve  the  effi¬ 
ciency  and  effectiveness  of  underlying  basic  capacities 
[Welford,  1976] .  Singleton  [1971]  advanced  the  idea  that 
skill  develops  by  selectivity  and  by  the  integration  of  activ¬ 
ities.  For  most  skill  development  theories,  it  is  generally 
agreed  that  as  learning  a  new  task  takes  place,  operators 
learn  a  basic  strategy  in  performing  the  task,  that  in  effect 
becomes  an  increasingly  skilled  template  with  the  qualities 
of  organizational  and  efficiency  of  operation  [Engler,  et  al. , 
1980] .  Depending  on  the  task,  the  level  of  skill  required  to 
perform  the  task  is  universally  measured  with  the  property  of 
variability.  Bowen,  et  al.  [1966]  found  considerable  varia¬ 
bility  as  measured  by  a  lack  of  consistency  for  all  skill 
levels  of  pilots  performing  tasks  in  an  OFT,  even  pilots  with 
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substantial  flight  experience.  It  thus  appears  that  as  an 
operator  begins  to  learn  a  new  task,  his  control  strategy  in 

performing  the  task  is  highly  inefficient,  resulting  in  a 
large  variability  of  actions.  As  skill  development  progresses, 
his  control  strategy  becomes  highly  efficient  and  effective, 
resulting  in  what  should  be  smaller  variability  of  actions. 

Three  phases  of  skill  development  have  been 
hypothesized  in  earlier  research  by  Fitts  [1962]  and  discussed 
in  terms  of  aircrew  skill  acquisition  by  Smode,  et  al.  [1962] 
and  Prophet  [1976].  The  stages  of  skill  acquisition  are  dis¬ 
cussed  below. 

(1)  Early  Skill  Development.  In  this  phase  the 
student  seeks  to  develop  a  cognitive  structure  of  the  task 
in  the  form  of  discriminating  the  task  purpose,  ascertaining 
standards  of  task  performance,  and  interpreting  performance 
information  feedback.  Actions  tend  to  be  slow  and  deliberate, 
and  depend  a  great  deal  on  concentrated  attention  and  effort 
in  performing  the  task. 

(2)  Intermediate  Skill  Development.  After 
learning  the  task  purpose  and  experiencing  some  practice  at 
the  task,  the  student  begins  to  organize  his  control  strategy 
by  becoming  more  efficient  in  responding  to  signals  displayed 
to  him.  Perceptual  and  response  patterns  become  fixed  with 
less  reliance  placed  on  verbal  mediation  of  response  integra¬ 
tion  by  the  student. 
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(3)  Advanced  Skill  Development.  This  phase 


represents  the  higher  level  of  skill  acquisition,  where  per¬ 
formance  becomes  more  resistant  to  stress  and  activities  are 
performed  concurrently.  The  rate  to  acquire  this  stage 
through  practice  is  different  for  each  individual,  as  prac¬ 
tice  on  any  complex  task  generally  spreads  individuals  out 
into  stable  but  different  skill  levels  [Jones,  1970].  Navi¬ 
gating  an  A-6E  CAINS  aircraft  falls  into  this  category  of 
complex  tasks.  This  stage  is  characterized  by  the  individual 
performing  in  an  automated  manner  requiring  little  conscious 
awareness  and  little  allocation  of  mental  effort  [Norman, 

[1976] . 

c.  Measurement  of  Skill  Acquisition 

Figure  3  is  a  model  developed  by  the  author  to 
illustrate  the  relationship  among  B/N  skill  acquisition,  the 
radar  navigation  task,  and  performance  measurement  and  evalu¬ 
ation.  An  understanding  of  the  model  depends  heavily  upon 
concepts  defined  and  discussed  in  Chapter  IV  and  in  the  early 
part  of  this  chapter.  This  model  will  be  used  for  the  current 
discussion  of  skill  acquisition  measurement. 

The  actual  measurement  of  skill  acquisition 
through  its  various  stages  has  received  little  practical  atten¬ 
tion  and  research,  most  likely  due  to  the  complexity  of  the 
subject  and  the  difficulty  involved  in  accurately  assessing 
human  performance  as  system  complexity  increases  [Glaser  and 
Klaus,  1966;  Vreuls  and  Wooldridge,  1977;  Kelley  and  Wargo, 
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Figure  3.  Skill  Acquisition  and  Performance  Evaluation  Model. 


1963;  Senders,  1974].  As  shown  in  Figure  3,  through  the 
process  of  learning  and  practice  of  the  task  over  several 
trials,  the  student  (represented  by  the  oval  shapes  in  the 
center  column)  should  progress  from  the  early  or  unskilled 
state  through  the  intermediate  stage  and  into  the  skilled  or 
"proficient"  stage  where  he  is  "trained."  Over  the  course  of 
one  task  trial,  the  most  that  can  be  accomplished  is  to  obtain 
objective  performance  measures  and  combined  measures,  and  to 
obtain  a  subjective  opinion  of  the  skill  level  from  the  one 
individual  well  qualified  and  skilled  in  performing  the  task: 
an  instructor.  Once  this  "indirect"  measurement  takes  place, 
a  comparison  is  made  between  the  objective  and  subjective 
measures  and  the  predefined  performance  criteria  or  MOEs.  It 
is  from  this  comparison  that  the  student's  skill  level  is 
finally  evaluated,  with  severe  limitations  imposed  due  to  the 
measurement  over  one  trial.  Skill  acquisition  through  the 
three  stages  of  development  occurs  not  only  at  different  rates 
but  over  the  course  of  several  trials.  This  fact  would  lend 
support  to  a  measurement  system  that  indirectly  measured  skill 
development  over  one  trial  and  used  historical  records  of  per¬ 
formance  to  measure  skill  development  over  several  task  trials 
The  model  shows  highly  likely,  likely,  and  very  unlikely  eval¬ 
uation  results  by  assuming  that  both  criteria  and  measures  are 
valid,  reliable, and  specific  to  the  purpose  of  measuring  skill 
acquisition. 
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Early  conceptions  of  measuring  aircrew  skill  were 
discussed  by  Smode,  et  al.  [1962]  and  Angell,  et  al .  [1964] . 

Both  structured  tasks  or  skills  into  a  hierarchical  model  and 
theorized  what  types  of  measures  (e.g.,  time,  accuracy,  fre¬ 
quency,  etc.)  would  be  appropriate  for  a  particular  level  of 
task  or  skill.  The  former  study  also  discussed  measurement 
at  the  three  stages  of  skill  acquisition:  (1)  Measurement  at 
the  first  stage  should  be  concerned  with  knowledge  and  task 
familiarity  as  well  as  distinctions  between  task  relevant  and 
task  irrelevant  information  and  cues,  and  a  differentiation 
between  in- tolerance  and  out-of-tolerance  conditions;  (2)  the 
intermediate  stage  has  measurement  concerned  with  procedure 
learning,  the  identification  of  action  stimuli,  and  the  per¬ 
formance  of  manipulative  activities;  and  (3)  measurement  of 
highly  developed  procedural,  perceptual-discriminative  motor 
and  concept-using  skills  and  the  integration  of  these  combin¬ 
ations  into  more  complex  units  of  performance  is  of  concern. 

An  experiment  by  Ryack  and  Krendel  [1963]  based 
on  research  by  Krendel  and  Bloom  [1963]  measured  highly  skilled 
pilots  performing  a  tracking  task  using  a  laboratory  apparatus. 
The  measurement  was  based  on  a  theory  that  a  highly  skilled 
pilot  displays  consistency  of  system  performance,  is  highly 
adaptable  to  changing  dynamic  requirements,  and  performs  the 
task  with  least  effort.  The  transfer  of  the  measurement  of 
these  three  conceptualizations  of  high  pilot  skill  from  the 
laboratory  to  an  actual  aircraft  was  not  demonstrated. 
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Later  conceptions  of  skill  acquisition  measure¬ 
ment  were  advanced  by  Welford  [1971  and  1976]  who  proposed 
that  as  practice  on  a  task  increased,  the  speed  of  performance 
on  that  task  as  measured  by  time  would  fall  exponentially. 
Haygood  and  Leshowitz  [1974]  proposed  using  an  information 
processing  model  to  measure  flying  skill  acquisition.  Bittner 
[1979]  evaluated  three  methods  for  assessing  "differential 
stability":  (1)  graphical  analysis,  (2)  early  versus  late 
correlational  Analysis  of  Variance  (ANOVA) ,  and  (3)  Lawley 
Test  of  Correlational  Equality.  That  study  recommended  graph¬ 
ical  analysis  as  a  method  of  first  choice. 

Three  recent  experiments  regarding  measurement  of 
aircrew  skill  acquisition  are  noteworthy.  Vreuls,  et  al . 

[1974]  used  multiple  discriminant  and  canonical  correlation 
analyses  to  discriminate  between  different  levels  of  skill 
using  four  pilots  in  an  F-4E  configured  simulator.  Using  six 
pilots  in  a  UH-1B  (helicopter)  simulator,  Murphy  [1976]  in¬ 
vestigated  individual  differences  in  pilot  performance  by 
measuring  both  man-machine  system  outputs  and  pilot  control 
outputs  during  an  instrument  approach  and  landing.  This  study 
concluded  that  performance  differences  may  be  attributed  to 
crewmember  differences  in  cognitive  styles,  information  pro¬ 
cessing  abilities,  or  experience.  Pierce,  et  al.  [1979]  con¬ 
centrated  on  procedures  to  assess  cognitive  skills  through 
the  use  of  behavioral  data  on  eight  pilots  performing  F— 4 
aircraft  pop-up  maneuvers.  The  primary  measurement  instrument 
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was  an  instructor  using  subjective  ratings  that  were  validated 
by  comparison  to  actual  bomb  scores . 

From  this  previous  research  and  the  discussion  of 
the  skill  acquisition  measurement  model,  it  becomes  readily 
apparent  that  measurement  of  B/N  skill  acquisition  in  the  A-6E 
WST  during  radar  navigation  will  require  both  an  analytical 
foundation,  as  described  in  this  thesis,  and  empirical  vali¬ 
dation  that  would  result  from  implementation  of  the  proposed 
measurement  system.  This  section  is  concluded  with  six 
recommendations  for  the  procedure  of  skill  appraisal,  as 
discussed  by  Singleton  [1971]  : 

(1)  Discuss  the  skilled  activity  almost  ad  nauseam  with 
the  individuals  who  practice  it  and  with  those  to 
whom  and  for  whom  they  are  responsible.  It  is  not 
enough  to  pop  in  at  intervals,  the  investigator  must 
spend  whole  shifts  and  weeks  with  the  practitioners 
to  absorb  the  operational  climate. 

(2)  Try  to  make  this  verbal  communication  more  precise 
by  using  protocol  techniques,  critical  incident 
techniques,  good/poor  contrast  techniques,  and  so  on. 

(3)  Observe  the  development  of  the  skill  in  trainees  and 
by  analysis  of  what  goes  on  in  the  formal  and  informal 
training  procedures  and  in  professional  assessment. 

Make  due  allowance  for  history,  tradition,  technolo¬ 
gical  change,  and  so  on. 

(4)  Structure  the  activity.  Identify  the  dimensions  of 
the  percepts,  the  decision  making,  the  strategies  of 
action  and  the  overt  activities,  and  try  to  provide 
scales  of  measurement  along  each  dimension. 

(5)  Check  as  many  conclusions  as  possible  by  direct 
observation,  performance  measurement,  and  by  exper¬ 
iment. 

(6)  Implement  the  conclusions  and  provide  techniques  for 
assessing  the  limitations  and  successes  of  the  inno¬ 
vations  . 
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VI.  A-6E  WST- PERFORMANCE  MEASUREMENT  SYSTEM 


The  A-6E  WST,  Device  2F114,  is  designed  to  provide  full 
mission  capability  for  pilot  transition  training,  B/N  tran¬ 
sition  training,  integrated  crew  training,  and  maintenance 
of  flight  and  weapon  system  proficiency  in  the  A-6E  Intruder 
aircraft.  The  WST  will  be  used  to  train  Navy /Marine  flight 
crew  members  in  all  A-6E  procedures  -  ground  handling,  normal 
and  emergency  flight  modes,  communications,  navigation  and 
cross-country  missions,  tactics,  and  crew  coordination  [Read, 
1974] .  Inherent  in  these  design  and  training  requirements  is 
the  necessity  for  the  measurement  of  aircrew  performance  and 
the  subsequent  measurement  of  improved  performance  after 
training  has  occurred;  a  necessary  goal  for  any  training  sim¬ 
ulator  [Knoop,  1968] .  This  section  will  discuss  the  general 
characteristics  of  the  WST  together  with  current  performance 
measurement  capabilities,  generic  performance  measurement 
systems  (PMS)  for  simulators,  and  current  performance  measure 
ment  and  evaluation  practices  for  student  B/Ns  in  the  WST. 

A.  WST  CHARACTERISTICS 

1 .  General  Description 

The  trainer  system  consists  of  the  following  elements 
Trainee  Station,  Instructor  Station,  Simulation  Area,  and 
Mechanical  Devices  Room  (see  Figure  4) .  The  Trainee  Station 
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Mechanical  Devices  Room  Simulation  Area 


Figure  4. 


Device  2F114,  A-6E  WST. 
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is  an  exact  replica  of  the  A-6E  CAINS  cockpit  and  is  mounted 
on  a  six  degree-of-freedom  motion  base  to  give  realistic 
motion  cues.  Sound  cues,  environmental  controls,  and  controls 
with  natural  "feel"  increase  the  similarity  between  WST  and 
aircraft  cockpits.  Normal  and  emergency  flight  configurations 
are  simulated,  together  with  all  modes  of  weapon  system  oper¬ 
ation. 

The  Instructor  Station  area  consists  of  a  wrap-around 
console  that  can  accommodate  two  principal  instructors  and  two 
assistant  instructors.  Controls  and  displays  are  utilized  by 
the  instructors  to:  (1)  set  up  and  control  the  training  prob¬ 
lem,  (2)  introduce  malfunctions  and  failures,  (3)  monitor 
trainee  actions  and  responses  to  malfunctions,  and  (4)  evaluate 
trainee  performance.  Four  interactive  CRT  displays  for  alpha¬ 
numeric  and  graphic  presentations,  together  with  repeater 
displays  of  the  VDI,  direct  view  radar  indication  (DVRI) ,  and 
electronic  countermeasures  (ECM)  found  in  the  aircraft  are 
available  to  the  instructors  for  use  in  training. 

The  Simulation  Area  contains  the  computers  necessary 
for  simulation  of  the  A-6E  CAINS  aircraft  and  its  missions. 

Four  real-time  minicomputers  are  utilized,  two  for  flight, 
one  for  tactics,  and  one  for  the  Digital  Radar  Land  Mass 
Simulation  (DRLMS) .  Magnetic  tape  units,  teletypewriter 
Pri-nters,  digital  conversion  equipment,  and  the  DRLMS  are 
also  contained  in  this  area.  The  DRLMS  is  designed  to  sim¬ 
ulate  landmass  radar  return  for  the  AN/APQ-156  radar,  which 
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is  a  major  component  of  the  A-6  navigation/weapon  system  and 
is  used  by  the  B/N  for  the  tasks  of  radar  scope  interpretation 
and  target  location. 

The  Mechanical  Devices  Room  contains  hydraulic  and 
power  equipment  for  positioning  of  the  Trainee  Station  motion 
system.  Also  provided  from  this  area  is  compressed  air  needed 
for  g-suit  and  environmental  control  requirements. 

2 .  Performance  Evaluation  System 

A  comprehensive  list  of  system  features  and  character¬ 
istics  is  beyond  the  scope  of  the  present  effort,  but  can  be 
found  in  the  Grumman  Aerospace  Corporation  Final  Configuration 
or  Criteria  Reports  for  the  A-6E  WST  [Blum,  et  al.,  1977,  1977, 
and  1977;  Rinsky,  1977].  Those  features  which  are  of  interest 
to  performance  evaluation  in  the  WST  are  shown  in  Figure  5  and 
discussed  below: 

a.  Program  Mission  Modes  are  provided  for  up  to  ten 
missions.  In  this  mode,  the  computer  system  automatically 
generates  and  sequences  mission  profiles.  During  this  mode, 
the  instructor  station  monitor  displays  a  listing  of  the 
mission  leg  number,  maneuver  to  be  performed,  mission  leg  end, 
parameters  to  be  monitored,  and  remarks  in  order  of  occurrence. 
Programmed  missions  are  activated  by  controls  from  the  instruc¬ 
tor  station. 

b.  Program  Mission  Mode  Critiquing  is  provided  by 
instructor  selection  of  up  to  six  parameters  for  monitoring 
on  each  leg  of  the  program  mission.  The  system  computes  the 
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Figure  5.  A-6E  WST  Performance  Evaluation  System. 


difference  between  the  parameter  and  a  tolerance  value  for 
the  parameter  preselected  by  the  instructor.  When  the  toler¬ 
ance  is  exceeded  for  a  selected  parameter,  the  exceeded  amount 
is  displayed  to  the  instructor  and  a  printout  record  is  pro¬ 
vided  at  rates  of  every  5  seconds,  10  seconds,  15  seconds, 

30  seconds,  1  minute,  and  2  minutes  on  a  printer/plotter. 
Available  parameters  are  shown  in  the  left-hand  column  of 
Figure  5. 

c.  Procedure  Monitor  Display  is  a  mode  in  which  all 
steps  of  up  to  any  two  procedures,  normal  or  emergency,  appear 
automatically  on  the  instructor's  display  system  when  called 
up  by  the  instructor.  The  text  for  the  procedures  and  mal¬ 
functions  is  a  listing  of  the  steps  to  be  performed  by  the 
trainee  and  an  indication  of  the  elapsed  time  required  by  the 
trainee  to  complete  the  procedure.  This  mode  is  also  avail¬ 
able  during  the  Programmed  Mission  Mode. 

d.  Parameter  Recording  is  available  on  all  modes  of 
WST  operation.  A  minimum  of  six  parameters  may  be  simultan¬ 
eously  recorded  on  a  continuous  basis  as  a  function  of  time, 
and  compared  to  preselected  tolerance  values  as  discussed  in 
(b)  above.  The  parameter  record  mathematical  model  program 
is  available  in  the  flight  simulation  computer. 

e.  CRITIQUE  Mode  calculates  miss  distance  and  clock 
code  of  trainee  delivered  weapons  on  an  instructor-designated 
target  by  the  use  of  a  Scoring  Math  Model.  Bomb  circular 
probable  error  (CEP)  is  calculated  using  available  functions 
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and  parameters  at  time  of  release.  Missile  releases  are 
scored  as  "hit"  or  "miss"  based  on  computed  comparisons  to 
a  respective  missile  envelope.  A  CRITIQUE  display  for  a 
permanent  record  of  the  results  is  available. 

f.  Event  Recording  is  provided  where  up  to  thirty 
events  selected  by  the  instructor  are  monitored  by  the  Per¬ 
formance  Evaluation  System.  After  the  event  is  selected  for 
recording  by  the  instructor,  a  printout  is  initiated  which 
contains  a  statement  of  the  event,  other  parameter  values, 

and  the  time  of  occurrence.  Table  VIII  is  a  listing  of  avail¬ 
able  events  for  recording  along  with  other  recorded  parameters. 

g.  Audio  voice  recording  with  a  time  mark  and  rapid 
recall  function  permits  the  instructor  to  access  desired  por¬ 
tions  of  trainee  headset  radio  during  and  after  a  training 
mission.  All  pertinent  communications  can  be  recorded  for  up 
to  2.5  hours. 

h.  Navigational  computations,  display  drive  signals 
and  positional  readouts  were  designed  to  be  within  0.1  nautical 
mile  of  true  position  based  on  ground  speed  and  true  course. 

i.  A  Versatec  electrostatic  Printer  Plotter  unit  is 
furnished  at  the  instructor  console  area  and  has  the  capability 
of  simultaneously  printing  and  plotting  parameter  recordings. 
Program  Mission  parameter  recordings,  and  event  recordings 
during  Free  Flight,  Program  Mission,  and  Demonstration  Man¬ 
euver/Trainee  mode  training  missions.  This  unit  can  also 
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TABLE  VIII.  A-6E  WST  EVENT  RECORDING  SELECTION 


EVENT 

OTHER  EVENT 
PARAMETERS 

1. 

Airborne 

— 

2. 

Gear-up  and  locked 

IAS 

3. 

Flaps/Slats-up 

IAS 

4. 

Stability  Augmentation  Engaged 

ALT 

5. 

Search  Radar  Switch  Stby/on 

- 

6. 

Doppler  Radar  Switch  Stby/on 

- 

7. 

Chaff  emitted 

- 

8. 

Isolation"  Valve  Switch  FH/Land 

- 

9. 

Fuel  Dump  (wing  or  FUS) -on/secured 

- 

10. 

Tank  Pressure  Switch-on/off 

- 

11. 

Present  Position  Correct  Button-Depressed 

12. 

Computer  -  on/off 

- 

13. 

Computer  Error  Lite  Lit 

- 

14. 

Master  Caution  Lite  Lit 

- 

15. 

ALQ-126  Switch  -  Rec/Repeat 

- 

16. 

Reselect  Lite  Flashing/Steady 

- 

17. 

Master  Arm  -  on/off 

- 

18. 

Attack;  Step  In  to/Out  of 

- 

19. 

Bomb  Release 

- 

20. 

Commit  Trigger  -  Depressed 

- 

21. 

AZ  Range  Switch  -  on/off 

- 

22. 

Velocity  Correct  Switch  -  Memory/Of f 

Save 

23. 

Track- While-Scan;  on 

- 

24. 

Computer  -  Out  of  Attack 

- 

25. 

Throttle (s)  below  75  percent 

- 

26. 

Gear  Handle  Down 

IAS 

27. 

Flap/Slat  Lever  -  3  0/4  0  degrees 

IAS 

28. 

Touchdown 

IAS ,  AOA 

29. 

Ram  Air  Turbine  -  in/out 

— 

30. 

Designate  -  on/off 

Slant  Range 

Source:  Blum,  et  al.  [1977] 
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print  out  any  display  type  designated  by  the  instructor  with 
a  maximum  of  20  printouts  possible  during  any  2.5-hour  mission. 

j .  The  tactics  computer  exercises  master  control  of 
the  system  and  includes  functional  control  of  the  attack  nav¬ 
igation  system,  system  displays,  weapons  release  system,  in¬ 
flight  refueling  system,  ECM,  threats,  magnetic  variations. 
Programmed  Missions,  dynamic  replay,  malfunctions,  instructor 
display  system,  malfunction  control,  displays,  instructor 
flight  control,  demonstration  maneuvers,  CRITIQUE  mode,  and 
others.  This  computer  was  installed  with  future  hardware  and 
software  growth  for  input-output,  memory  core,  and  computation 
as  a  design  specification. 

The  performance  measurement  capability  of  the  A-6E  WST 
appears  to  have  an  impressive  objective  measurement  capability. 
The  hardware  and  software  computer  system  was  designed  with 
objective  performance  measurement  in  mind,  although  no  definite 
model  or  technique  was  provided  by  the  designers  for  evaluating 
B/N  skill  acquisition  during  a  radar  navigation  mission.  The 
foundation  has  been  laid  for  objective  measurement;  all  that 
remains  is  building  a  sound  performance  measurement  structure 
based  on  principles  and  models  that  have  been  examined  and 
evaluated  thus  far. 

B.  PERFORMANCE  MEASUREMENT  SYSTEMS 

Measuring  aircrew  performance  can  be  viewed  as  a  system 
within  itself.  Every  system  consists  of  an  assemblage  or 
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combination  of  objects  or  parts  forming  a  complex  or  unitary- 
whole  with  definable  characteristics  and  a  common  purpose. 

The  purpose  of  the  performance  measurement  system  (PMS) 
examined  here  will  be  to  provide  FRS  instructors  and  training 
managers  with  valid,  reliable,  and  objective  information  needed 
to  guide  decisions  about  trainee  skill  acquisition.  The  PMS 
for  the  similator  has  definable  components,  functions,  inputs, 
outputs,  communication  links,  and  procedures  that  all  interact 
to  form  a  system  that  may  or  may  not  be  efficiently  designed 
or  implemented.  Criteria  for  PMS  selection  may  outweigh  some 
desirable  system  characteristics  as  well  as  the  optimal  allo¬ 
cation  of  functions  to  a  particular  component.  After  a  PMS 
has  been  analyzed,  functions  allocated,  and  system  criteria 
selected,  implementation  of  the  PMS  within  the  operational 
environment  may  impose  further  constraints  that  cause  redesign 
of  the  system.  The  interactions  of  PMS  analysis,  functional 
allocation,  criteria,  and  implementation  are  discussed  below. 

1 .  Systems  Analysis 

Since  the  purpose  of  the  PMS  being  discussed  is  to 
provide  information  about  student  skill  level  to  training  per¬ 
sonnel  for  accurate  training  control  decision-making,  this 
system  can  be  viewed  from  an  information-processing  approach. 
Information  in  the  form  of  data  is  sensed  and  collected,  re¬ 
corded  and  processed,  and  presented  in  an  output  form  that  is 
useful  for  performance  evaluation  purposes.  These  functions 
of  the  system  are  interdependent  and  may  be  served  by  the 
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same  component.  Major  components  of  the  PMS  are  instructors 
and  computers  with  data  storage  capability.  Discussion  of 
the  PMS  analysis  follows. 

a.  Data  Sensing  and  Acquisition 

Performance  must  be  observed  to  be  sensed  and 
collected.  Performance  measurement  considerations  in  data 
collection  include  but  are  not  limited  to:  (1)  mission  pur¬ 
pose,  (2)  flight  regime,  (3)  maneuver  performed,  (4)  tasks, 

(5)  skills  required,  (6)  operator  physiological  output  meas¬ 
ures,  (7)  aircraft  measures,  (8)  aircrew-aircraft  system  output 
measures,  (9)  mission  results,  (10)  flight  management,  (11) 
procedural  control,  (12)  aircraft  systems  management,  (13) 
operator  motivation,  and  (14)  historical  data. 

Sensing  and  collecting  performance  information  in 
a  simulator  can  be  accomplished  by:  (1)  mechanical  and  elec¬ 
tronic  devices  including  digital  computers  and  (2)  direct 
human  observation  [Angell,  et  al. ,  1964] .  The  first  category 
is  usually  referred  to  as  "automated"  measurement  devices, 
and  may  include  video/photo  recorders,  audio/digital  recorders, 
timers  and  counters,  graphic  recorders,  and  plotters  [Smode, 
et  al.  ,  1962  ;  Angell,  et  al. ,  1964;  Obermayer,  et  al.,  1974; 
Hagin,  et  al.,  1977].  Direct  human  observation  may  or  may  not 
be  standardized  by  preplanned  performance  checklists  or 
instructions . 

Video/photo  recorders  provide  permanent  records 
of  performance  and  are  suitable  for  instructors  to  use  in 
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observing  performance  more  objectively  because  of  playback 
features.  Audio/digital  recorders  involve  recording  of  com¬ 
munications  and  direct  measurement  conversion  to  digital  form 
by  a  computer  for  selected  observable  parameters.  Audio 
recordings  may  be  utilized  as  a  more  objective  measurement 
for  instructor  use  but  currently  have  data  conversion  limita¬ 
tions.  Digital  recording  of  discrete  and  continuous  measures 
from  all  levels  of  aircrew-aircraft  system  performance  has 
been  demonstrated  in  both  simulators  and  in  actual  aircraft 
flight  over  the  past  twenty  years  [Wierwille  and  Williges, 
1978;  Mixon  and  Moroney,  1981],  Timers  and  counters  are  suit¬ 
able  as  auxiliary  components  to  digital  computer  measurement 
for  both  time  and  frequency  performance  measures  [Angell,  et 
al.,  1964].  Graphic  recorders  are  electromechanical  in  oper¬ 
ation  and  provide  continuous  records  of  event  states  and  mag¬ 
nitudes  along  a  time  continuum.  Graphic  recorders  are  usually 
either  classified  as  event  (discrete  performance)  or  contin¬ 
uous  (magnitude  of  continuous  variable)  [Angell,  et  al.,  1964] 
Plotters  display  information  in  Cartesian  or  rectangular  coor¬ 
dinates,  and  are  useful  for  both  performance  data  collection 
and  output. 

Charles  [1978J  thoroughly  studied  the  role  of  the 
instructor  in  a  simulator  and  determined  one  function  of  the 
instructor  was  to  monitor  performance  in  the  form  of  student 
procedures,  techniques,  skill  level,  and  simulator  performance 
Indeed,  the  direct  observation  of  student  performance  may 
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sometimes  be  the  sole  source  of  valuable  performance  measure¬ 
ment,  especially  when  unexpected  events  occur  during  a  simu¬ 
lator  mission  [Smode,  et  al. ,  1962].  As  previously  mentioned, 
video  recording  with  playback  capability  improves  the  human 
observation  data  collection  method.  Usually,  performance 
measurements  resulting  from  this  technique  must  be  converted 
for  digital  computer  use  in  the  data  playback  and  processing 
stage. 

b.  Data  Processing  and  Analysis 

Once  performance  data  are  sensed  and  collected 
by  mechanical  or  human  means,  some  conversion  is  usually  re¬ 
quired  to  make  the  raw  data  more  useful  for  the  system  purpose, 
i.e.,  to  provide  information  for  accurate  performance  evalua¬ 
tion.  Usually  all  data  are  converted  to  a  digital  format 
appropriate  to  the  general  purpose  computer.  It  is  in  this 
stage  where  computers  and  peripheral  equipment  such  as  input/ 
output  devices,  memory  core  units,  and  magnetic  tape  drives 
are  extremely  accurate,  efficient  and  cost-effective  as  com¬ 
pared  to  human  processing  of  data,  although  some  data  types 
may  not  be  convertible  to  a  digital  format  and  must  be  carried 
to  the  system  output  stage  in  raw  form.  In  this  stage, 
usually  video  recordings  are  reviewed  by  the  instructor  to 
increase  the  objectivity  of  his  direct  human  observation  of 
performance.  For  the  interested  reader,  Oberraayer  and  Vreuls 
[1974]  present  a  more  detailed  account  of  data  playback  and 
processing  components  and  interactions. 
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Data  Presentation 


c . 

After  data  analysis,  the  data  will  be  available 
as  output  measures  for  the  evaluation  process.  The  output 
format  may  be  numerical,  graphical,  audio,  visual,  or  some 
other  form.  Since  the  evaluation  process  involves  the  com¬ 
parison  of  performance  data  to  standards  or  criteria,  some 
of  the  performance  data  may  be  utilized  as  criteria  for  sub¬ 
sequent  evaluation  use.  Most  likely,  the  data  output  will 
be  typical  measures  of  time,  accuracy,  and  frequency  for 
various  task  levels.  Some  measures  may  be  combined  in  the 
processing  stage  and  used  as  output  data  for  comparisons  to 
established  MOEs. 

2 .  Allocation  of  Functions 

One  result  of  the  systems  analysis  of  the  simulator 
PMS  was  to  identify  functions  that  have  to  be  performed. 

Given  there  may  be  an  option  as  to  whether  any  particular 
function  should  be  allocated  to  the  human  or  a  machine,  some 
knowledge  of  the  relative  capabilities  of  humans  and  machines 
would  be  useful  for  determining  the  allocation  of  functions. 
Some  relative  capabilities  among  mechanical  devices  and  human 
observers  were  discussed  in  the  previous  section  but  more 
detail  is  necessary.  Using  the  results  of  McCormick  [1976]  , 
Buckhout  and  Cotterman  [1963],  Obermayer ,  et  al.  [1974], 
Angell ,  et  al.  [1964],  and  Coburn  [1973],  the  capabilities 
of  humans  and  machines  for  the  purpose  of  performance  meas¬ 
urement  in  the  simulator  are  presented  in  Table  IX. 
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TABLE  IX:  HUMAN  AND  MACHINE  CAPABILITIES  AND  LIMITATIONS 


HUMAN  CAPABILITIES 

Detect  stimuli  against  background  of  high  noise  (CRT) . 
Recognize  patterns  of  complex  stimuli  (DVRI) . 

Sense  and  respond  to  unexpected  events. 

Store  large  amounts  of  diverse  information  for  long  periods. 
Retrieve  information  from  storage  (with  low  reliability) . 
Draw  upon  experience  in  making  decisions. 

Reason  inductively,  generalizing  from  observations. 

Apply  principles  to  solutions  of  varied  problems. 

Make  subjective  estimates  and  judgements. 

Develop  entirely  new  solutions. 

Select  only  most  important  events  for  sensing  inputs. 

Acquire  and  record  information  incidental  to  primary 
mission. 

High  tolerance  for  ambiguity,  uncertainty,  and  vagueness. 

Highly  flexible  in  terms  of  task  performance. 

Performance  degrades  gradually  and  gracefully. 

Override  own  actions  should  need  arise. 

Uses  machines  in  spite  of  design  failures  or  for  a 
different  task. 

Modify  performance  as  a  function  of  experience. 

MACHINE  CAPABILITIES 

Sense  stimuli  beyond  man's  range  of  sensitivity. 

Apply  deductive  reasoning  when  classes  are  specified. 
Monitor  for  prespecified  frequent  and  infrequent  events. 
Store  coded  information  quickly  and  in  quantity. 

Retrieve  coded  information  quickly  and  accurately. 
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TABLE  IX  (Continued) 


MACHINE  CAPABILITIES  (Continued) 

Process  quantitative  information. 

Make  rapid,  consistent,  and  repetitive  responses. 

Perform  repetitive  and  concurrent  activities  reliably. 
Maintain  performance  over  time. 

Count  or  measure  physical  quantities. 

Transfer  function  is  known. 

Data  coding,  amplification,  and  transformation  tasks. 
Large  channel  capacity. 

Not  influenced  by  social  and  physiological  factors. 

HUMAN  LIMITATIONS 

Sense  stimuli  within  a  limited  range. 

Poor  monitoring  capability  for  activities. 

Mathematical  computations  are  poor. 

Cannot  retrieve  large  amounts  of  information  rapidly  and 
reliably. 

Cannot  reliably  perform  repetitive  acts. 

Cannot  respond  rapidly  and  consistently  to  stimuli. 
Cannot  perform  work  continuously  over  long  periods. 
Requires  time  to  train  for  measurement  and  evaluation. 
Expectation  set  leads  to  "see  what  he  expects  to  see." 
Requires  review  time  for  decisions  based  on  memory. 

Does  not  always  follow  an  optimum  strategy. 

Short-term  memory  for  factual  material. 

Not  suited  for  data  coding,  amplification,  or  transfor¬ 
mation. 

Performance  degraded  by  fatigue,  boredom  and  anxiety. 
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TABLE  IX  (Continued) 


HUMAN  LIMITATIONS  (Continued) 

Cannot  perform  simultaneous  tasks  for  long  periods. 

Channel  capacity  limited. 

Dependent  upon  social  environment. 

MACHINE  LIMITATIONS 

Cannot  adapt  to  unexpected,  unprogrammed  events. 

Cannot  learn  or  modify  behavior  based  on  experience. 

Cannot  "reason”  or  exercise  judgement. 

Uncoded  information  useless. 

Inflexible. 

Requires  stringent  environmental  control  (computers) . 

Cannot  predict  events  in  unusual  situations. 

Performance  degraded  by  wearing  out  or  lack  of  calibration. 
Limited  perceptual  constancy  and  are  expensive. 

Non- portable. 

Long-term  memory  capability  is  expensive. 

Generally  fail  all  at  once. 

kittle  capacity  for  inductive  reasoning  or  generalization. 


153 


When  fully  exploited  with  no  other  limitations  imposed,  these 
capabilities  and  limitations  of  the  human  and  machine  define 
what  might  be  described  as  an  "optimal"  performance  measure¬ 
ment  system  from  the  engineering  standpoint.  As  Knoop  and 
Welde  [1973]  observed,  a  performance  measurement  system 
should  "capitalize  on  the  advantages  of  an  automated,  objec¬ 
tive  system  and  yet  retain  some  of  the  unique  capabilities 
afforded  by  the  human  evaluator."  For  each  task  which  is  to 
be  measured  and  evaluated,  a  decision  must  be  made  as  to 
whether  it  would  be  more  efficient  for  the  man  or  the  machine 
to  measure  or  evaluate  performance  on  that  task  [Buckhout  and 
Cotterman  [1963]. 

3 .  System  Criteria 

In  addition  to  examining  human  and  machine  capabili¬ 
ties  and  limitations  for  the  functions  and  components  of  a 
performance  measurement  system,  other  factors  with  potential 
impact  on  system  design  and  implementation  must  be  identified, 
analyzed,  and  weighed  for  importance.  Choosing  a  system 
solely  by  human-machine  advantages  does  not  take  into  account 
other  apparently  extrinsic  influences  that  may  turn  out  to  be 
deciding  factors.  The  following  listing  of  system  criteria 
for  performance  measurement  systems  was  gleaned  from  research 
by  Obermayer  and  Vreuls  [1974],  Buckhout' and  Cotterman  [1963], 
Demaree  and  Matheny  [1965] ,  Farrell  [1974]  ,  and  Carter  [1977]  : 

a.  Conflicts  of  system  purpose  may  exist.  The  PMS 
is  required  to  provide  objective,  reliable,  and  valid 
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information  for  decision-making  purposes  and  to  also  identify 
changes  in  student  skill  level.  A  component  may  be  the  most 
objective  choice  available  for  the  first  goal  but  inadequate 
for  the  second.  An  alternative  component  or  function  may  be 
identified  to  do  both  satisfactorily. 

b.  Data  should  be  provided  in  a  useful  form  for 
evaluation  purposes. 

c.  Data  collection,  processing,  and  presentation  must 
be  timely  enough  to  enhance  the  training  process  in  the  form 
of  knowledge  of  results. 

d.  Costs  to  modify  or  supplement  equipment  and  soft¬ 
ware  must  be  weighed  against  the  utility  of  the  information 
derived. 

e.  Data  distortion  must  be  controlled  for  accurate 
and  reliable  results. 

f.  Minimum  interference  with  the  training  process 
should  occur  with  the  measurement  system  having  an  inconspic¬ 
uous  role  requiring  little  or  no  attention  from  the  student 
or  instructor. 

g.  Social  impacts  of  any  system  may  have  adverse 
effects  on  morale  or  personnel  involvement.  If  an  instructor 
perceives  that  automated  performance  measurement  is  a  replace¬ 
ment  to  his  traditional  role  as  an  evaluator,  the  effectiveness 
of  the  measurement  system  will  be  greatly  reduced. 

h.  Economic  and  political  constraints  may  affect 
system  design.  The  ideal  measurement  device  may  not  be 
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recommended  for  procurement  by  higher  authority,  while  some 
selection  of  components  is  based  on  available  equipment  at 
time  of  procurement. 

i.  Other  factors  such  as  size,  weight,  safety,  ease 
of  use  and  reliability  should  also  be  considered. 

Obviously,  system  criteria  should  be  used  in  the  sense 
that  selecting  components  and  functions  for  a  performance  meas¬ 
urement  system  would  maximize  those  criteria  that  are  advan¬ 
tageous  and  minimize  those  aspects  that  are  not  optimum  for 
the  system  purpose.  These  criteria  must  be  taken  into  account 
during  allocation  of  functions  for  the  performance  measurement 
system,  and  must  be  weighed  at  least  qualitatively  if  not  in 
a  quantitative  sense  for  overall  contribution  to  the  final 
system  configuration. 

4 .  System  Implementation 

Once  the  objectives  of  the  performance  measurement 
system  are  identified  and  the  allocation  of  functions  and 
system  criteria  are  applied,  the  system  model  then  requires 
a  deliberate  implementation  procedure  if  it  is  to  produce 
meaningful  results  and  have  utility  to  the  end  user.  Waag, 
et  al.  [1975]  identified  four  phases  of  development  for  imple¬ 
mentation  of  the  measurement  system  in  the  simulator: 

(1)  Definition  of  criterion  objective  in  terms  of  a 
candidate  set  of  simulator  parameters. 

(2)  Evaluation  of  the  proposed  set  of  measures  for  the 
purpose  of  validation  and  simplification. 
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(3)  Specification  of  criterion  performance  by  requiring 
experienced  instructor  aircrew  to  fly  the  maneuver 
in  question. 

(4)  Collection  of  normative  data  using  students  as 
they  progress  through  the  training  program. 

When  using  automatic  or  human  measurement  components 
within  the  performance  measurement  system,  other  implementa¬ 
tion  considerations  should  apply.  These  are  discussed  below 
a.  Automatic  Measurement  Considerations 

In  simulator  environments,  the  organization  of 
the  software  will  be  the  key  to  successful  implementation  of 
flight  training  measurement  systems  [Vreuls  and  Obermayer, 
1971] .  Extensive  research  into  programming  techniques  for 
the  automatic  monitoring  of  human  performance  in  the  flight 
regime  has  been  accomplished  [Knoop,  1966  and  1968;  Vreuls 
and  Obermayer,  1974;  Vreuls,  et  al.,  1973,  1974,  and  1975]. 
Knoop  [1968]  examined  some  of  the  prerequisites  for  automati 
cally  monitoring  and  evaluating  human  performance: 

(1)  Knowledge  is  required  of  which  performance  variables 
are  important  in  evaluating  an  operator's  proficiency 

(2)  Knowledge  is  required  of  how  these  variables  should 
be  related  for  optimal  performance. 

(3)  A  digital  computer  program  is  required  which  com¬ 
pares  actual  relationships  among  these  variables 
during  performance  with  those  required  for  optimal 
performance  to  evaluate  operator  proficiency. 

These  prerequisites  point  out  the  need  for  careful  front-end 

analysis  of  the  system  in  terms  of  performance  measures  and 

criteria,  and  the  complex  problem  involved  of  programming 

this  analysis  for  automatic  measurement  and  evaluation. 
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b.  Human  Measurement  Considerations 

Smode  [1962]  provides  some  rules  for  enhancing 
the  validity  and  reliability  of  resulting  measurement  where 
human  observers  are  employed  in  data  collection: 

(1)  Provide  standardized  checklists  that  specify  what 
to  observe,  when,  and  how  often  observations  are 
recorded . 

(2)  Train  observers  for  the  measurement  process  to  insure 
full  understanding  of  how  to  make  the  observations. 

(3)  Provide  data  collection  sheets  that  conveniently 
indicate  what  is  to  be  observed  and  the  sequence 
of  observation. 

(4)  Avoid  overloading  the  observer  with  too  much  simul¬ 
taneous  observation  and  recording. 

(5)  Data  collection  forms  should  have  notation  or 
symbology  for  recording  observations,  when  feasible, 
and  should  result  in  a  permanent  record  that  can 

be  easily  transformed  into  a  form  for  rapid  analysis. 

These  guidelines  still  appear  sensible  today,  with  perhaps 
some  additional  information  about  the  relationship  between 
automatic  measurement  and  the  human  observer  being  established 
and  provided  within  the  system. 

C.  CURRENT  PERFORMANCE  MEASUREMENT  IN  THE  WST 

Navy  B/N  replacement  training  is  conducted  by  both  the 
East  Coast  FRS ,  Attack  Squadron  Forty-Two  (VA-42) ,  and  the 
West  Coast  FRS,  Attack  Squadron  One  Twenty  Eight  (VA-128). 

Each  FRS  has  developed  and  maintains  its  own  training  program; 
however,  these  programs  are  similar  in  nature  and  utilize 
virtually  the  same  performance  measurement  and  evaluation 
techniques.  Each  training  program  is  divided  into  specific 
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phases  designed  to  develop  certain  skills  such  as  navigation, 
system  operation,  or  attack  procedures,  and  each  uses  the 
mediums  of  classroom  lecture,  simulator,  or  actual  aircraft 
flight.  A  building-block  approach  to  developing  progressive 
knowledge  of  skills  is  utilized,  including  training  missions 
in  the  WST.  This  study  will  focus  on  a  Category  One  (CAT  I) 
B/N  student,  where  entry  skills  and  knowledge  for  measurement 
in  the  WST  are  minimal. 

Determination  of  the  skill  level  of  CAT  I  B/Ns  performing 
simulated  missions  in  the  A-6E  WST  continues  to  be  based  on 
subjective  judgements  made  by  instructor  B/Ns  and  pilots.  A 
syllabus  of  progressively  more  difficult  flights  in  the  WST 
is  part  of  the  training  curriculum.  During  or  shortly  after 
each  flight,  the  instructors  "grade"  the  student  on  tasks 
performed  employing  mostly  personal  criteria,  based  on  exper¬ 
ience  and  normative  comparisons  of  the  student's  performance 
with  other  student  performances.  Table  X  is  a  compilation  of 
tasks  taken  from  B/N  flight  evaluation  sheets  for  the  VA-42 
simulator  curriculum  for  which  B/N  performance  is  graded  on 
a  scale  using  four  categories:  (1)  unsatisfactory,  (2)  below 
average,  (3)  average,  and  (4)  above  average.  A  typical  B/N 
flight  evaluation  sheet  for  the  WST  is  shown  in  Figure  6. 

The  four  ratings  listed  above  are  then  converted  to  a  4.0 
scale  for  numerical  analysis  and  student  rankings. 

During  a  personal  visit  by  the  author  to  the  VA-42 
A-6E  WST  in  June  1980,  subjective  performance  measurement 
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TABLE  X:  CURRENT  B/N  TASKS  GRADED  IN  WST  CURRICULUM 


Aggressiveness 

Aircraft  familiarity 

Aircraft  system  knowledge  NATOPS 

Aircraft/system  operations 

Aircraft/system  turnup 

Aircraft/system  utilization 

ALE- 3 9  awareness 

ALQ-126  awareness 

ALR-45/50  awareness 

AMT  I 

Approach 

Approach  (TACAN,  GCA,  ASR) 

Attack  procedures/ACU  knowledge 

Attitude 

Basic  air  work 

BINGO  procedures 

Communications 

Computer  operation 

Crew  concept 

Degraded  aircraft/system  operation 
Degraded  aircraft/system  utilization 
Degraded  system  CEP 
Degraded  system  utilization 
Departure 

Departure  procedures 
ECM  equipment  knowledge 
ECM  tactics 
Emergency  procedures 
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TABLE  X  (Continued) 


Flight  briefing 

Fuel  management 

Full  system  utilization 

General  attack  CEP 

Glideslope  control 

Headwork 

HI  Loft  attack  CEP 
Impact  accuracy 
Knowledge  of  the  cockpit 
LABS  attack  CEP 
Landing  transition 
Line-up  corrections 
Low  level  navigation 
Marshall  pattern 
Mining  procedures 
NATOPS 

Navigation  procedures 
NORDO  procedures 
Normal  procedures 
Planning/preparation 
Point  checks 

Post  landing/shutdown  procedures 

Post  start 

Prestart 

Radar  interpretation 
Radar  operation 
S-l  pattern 
S-3  pattern 
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TABLE  X  (Continued) 


Shipboard  procedures 
SRTC  utilization 
Stall  series 
Start 

Start/point  checks 
Straight  path  attack  CEP 
System  shutdown 
System  turnup 
Takeoff /departure 
Target  procedures 

Targets  (geographical  turn  point  listed) 

Turn  point  acquisition 

UHF  communications 

Use  of  checklists 

Use  of  the  clock 
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B/N  FLIGHT  EVALUATION  SHEET 
>>>  BCW01  <<< 


REPLACEMENT : 

INSTRUCTOR _ 

FLIGHT  TIME:  TOTAL 
BRIEF  TIME 


BUNO 

DISPOSITION  CODE: 

NIGHT _  INST 

T/O  TIME _  DEBRIEF 


1. 

2. 

3. 

4. 

5. 

6. 

7. 

8. 

9. 

10. 

11. 


ITEM  UN  3  A  A  AA 


DEPARTURE  PROCEDURES . 

1  • 

: 

• 

MARSHALL  PATTERN . 

• 

: 

APPROACH . 

I 

GLIDESLOPE  CONTROL . 

i 

- j - 

LINE-UP  CORRECTIONS . 

j 

I 

COMMUNICATIONS . 

i 

i 

NORDO  PROCEDURES . 

• 

. - . . 

) 

BINGO  PROCEDURES . 

• 

i 

EMERGENCY  PROCEDURES . 

1 

HEADWORK . 

• 

: 

CREW  CONCEPT . 

- j - 

• 

\ 

j 

{ 

: 

TOTAL 

»  m 

1 

: 

t 

COMMENTS : 


SIGNATURE 


DATE: 


Figure  6.  Typical  B/N  Flight  Evaluation  Sheet  for  WST 
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and  evaluation  was  exclusively  being  conducted  for  training 
missions  of  students  in  the  WST.  The  use  of  subjective  per¬ 
formance  measurement  and  evaluation  was  due  mainly  to  the 
newness  of  the  simulator  and  the  traditional  and  acceptable 
role  that  subjective  methods  have  played  for  the  last  three- 
quarters  of  a  century  across  all  aviation  communities.  Oper¬ 
ational  FRS  personnel  rarely  have  the  time  to  carefully 
analyze  and  employ  new  performance  measurement  models  and 
techniques  or  utilize  new  systems  that  are  incorporated  into 
a  newly-delivered  simulator.  One  purpose  of  this  thesis  is 
to  eliminate  the  gap  of  carefully  analyzing  and  evaluating 
performance  measurement  models  for  operational  use. 

Due  to  the  exclusive  use  of  subjective  performance  meth¬ 
ods,  standards  of  performance  in  the  A- 6  FRS  are  established 
analytically  based  on  perceptions  by  the  instructors  on  what 
constitutes  "proficient"  or  "skilled"  performance.  Bombing 
and  radar  target  identification  (RTI)  criteria  are  used 
internally,  with  RTI  grading  based  on  target  difficulty  and 
the  replacement  B/N's  level  of  exposure  and  experience. 
Essentially,  RTI  criteria  use  a  trinomial  division  of  "hit 
the  target,"  "in  ball  park,"  and  "out  of  ball  park"  that  has 
been  defined  well  enough  to  be  converted  to  a  numerical  grade. 
In  addition,  the  replacement  B/N  must  identify  75  percent  of 
the  assigned  targets  on  a  specific  radar  navigation  "check 
flight"  as  a  criterion  for  radar  navigation  skill. 
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The  proposed  model  for  measuring  3/N  performance  during 
radar  navigation  in  the  WST,  to  be  discussed  in  Chapter  VII, 
incorporates  the  best  qualities  of  both  subjective  and 
objective  measurement,  and  uses  the  results  to  provide 
accurate  and  valid  information  for  making  decisions  about 
student  progress  within  the  training  process.  Under  the 
proposed  model,  criteria  for  successful  performance  will  be 
established  empirically  for  operational  use  by  either  the 
A- 6  FRS  or  the  A- 6  community  as  a  whole. 
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VII.  RESULTS  AND  DISCUSSION 


The  purpose  for  designing  a  performance  measurement  system 
for  the  B/N  during  radar  navigation  in  the  A-6E  WST  was  to 
provide  objective,  reliable,  valid,  and  timely  performance 
information  for  accurate  decision-making  to  the  training  mis¬ 
sion  instructor  and  FRS  training  manager.  This  section  pre¬ 
sents  the  performance  measurement  system  model  developed  from 
the  previous  analysis  of  related  aircrew  performance  literature, 
generic  performance  measurement  systems  concepts,  the  B/N  radar 
task  analysis  and  MTLA,  and  the  A-6E  crew-system  network  model. 
The  model  developed  is  specific  from  the  standpoint  of  identi¬ 
fying  what  to  measure,  when  to  measure,  scaling,  sampling 
frequency,  criteria  establishment,  applicable  transformations, 
observation  method,  current  availability  in  the  A-6E  WST,  and 
the  accessibility  of  the  measure  if  not  currently  available. 

The  proposed  model  embodies:  (1)  the  establishment  of  standards 
of  performance  for  the  candidate  measure  set  by  utilizing 
fleet— exper ienced  and  motivated  A— 6E  aircrews  performing  well 
defined  radar  navigation  maneuvers  and  segments,  (2)  techniques 
for  reducing  the  candidate  measures  to  a  small  and  efficient 
set  by  statistical  analysis,  (3)  evaluation  methods  which  use 
the  results  from  established  performance  standards  and  perfor¬ 
mance  measurement  of  student  B/Ns  for  decision  analysis  bv 
the  FRS  instructor  and  training  manager,  and  (4)  some 


166 


performance  measurement  informational  displays  which  present 
diagnostic  and  overall  evaluation  results  in  a  usable  and 
efficient  format. 

A.  CANDIDATE  MEASURES  FOR  SKILL  ACQUISITION 

Using  the  candidate  performance  measure  metrics  derived 
from  the  B/N  task  analysis  (Appendix  C)  and  previous  aircrew 
research  (Table  II) ,  a  composite  list  of  candidate  measures 
for  B/N  skill  acquisition  is  presented  as  Table  XI.  Informa¬ 
tion  is  provided  for  each  measure  in  terms  of  the  method  of 
measurement,  measure  segment,  scaling,  sampling  rate,  criteria 
establishment,  transformations,  availability  in  the  A-6E  WST, 
and  accessibility  in  the  A-6E  WST  (if  not  available) .  Each 
of  these  terms  is  defined  below. 

1 .  Method  of  Measurement 

Either  electronically  (E) ,  instructor  observation  (0) , 
or  both.  The  primary  basis  for  the  determination  of  the  best 
method  was  both  measure  selection  criteria  (Chapter  IV)  and 
human  and  machine  capabilities  and  limitations  (Table  IX) . 

2 .  Measure  Segment 

This  is  the  period  of  time  or  segment  of  flight  in 
which  the  measure  should  be  observed.  "ENTIRE  LEG"  defines 
the  radar  navigation  segment  from  TP  to  TP,  "MISSION"  defines 
the  segment  from  takeoff  to  landing,  and  "TP"  defines  within 
one  minute  of  approaching  the  TP. 
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TABLE  XI:  CANDIDATE  PERFORMANCE  MEASURES 
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TABLE  XI:  Contd 
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TABLE  XI:  Contd 
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3.  Scale 


Scale  shows  the  numerical  limits  and  units  of  the 
measure,  e.g. ,  seconds,  feet,  miles,  or  if  units  are  unassigned, 
the  values  that  the  measure  is  assumed  to  take  on  over  the 
measure  segment.  "0/1"  defines  a  dichotomous  "not  occurred/ 
occurred"  situation.  Subjective  scales  are  listed  separately 
(e.g.,  "effectiveness  of  communication"). 

4 .  Sampling  Rate 

The  sampling  rate  is  the  rate  of  measurement  defined 
by  time  or  by  measure  segment.  Determination  was  based  on 
research  by  Vreuls  and  Cotton  [1980] . 

5 .  Criteria 

The  recommended  method  of  establishing  performance 
criteria  for  the  performance  measure  is  defined  as  "BMP"  for 
empirical,  "OPER"  for  operational  (subjective  determination 
by  fleet  aircrew) ,  or  both. 

6 .  Transformation 

A  recommended  mathematical  or  statistical  process  for 
the  measure  is  provided  based  on  the  literature  review  by 
Mixon  and  Moroney  [198  0]  .  A  caution  is  provided  that  the 
determination  of  the  measure's  distribution  would  be  in  order 
before  applying  any  transformations,  as  previously  discussed 
in  Chapter  IV. '  Transformations  are  listed  as  TIME,  FREQ 
(frequency) ,  ACC  (accuracy) ,  ME  (mean) ,  MO  (mode) ,  PROPORTION 
(of  successes  to  total  number) ,  MIN  (minimum) ,  MAX  (maximum) , 

RMS  (root  mean  squared  error) ,  or  N/A  (not  applicable;  if 
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measured  for  computational  purposes  only) .  Other  transfor¬ 
mations  may  be  possible;  see  Table  IV. 

7 .  Currently  Available 

The  measure  exists  within  the  A-6E  WST  PMS ,  as  listed 
in  Table  VIII  or  Figure  5.  If  blank,  the  measure  is  not 
available. 

8 .  Accessible 

If  not  currently  available  in  the  A-6E  WST  PMS,  a 
determination  was  made  as  to  the  feasibility  of  incorporating 
the  measure  into  the  existing  WST  PMS  with  a  minor  software 
change.  If  blank,  major  changes  may  be  required  in  the  WST 
PMS  to  facilitate  the  accessibility  of  the  measure. 

It  is  recognized  that  the  resultant  candidate  measure  set 
contains  redundant  and  perhaps  overlapping  measures  but  ana¬ 
lytical  derivation  is  necessary  before  empirical  analysis  is 
possible.  Only  "objective"  aspects  of  performance  were  listed; 
the  determination  of  subjective  components  (i.e.,  motivation) 
and  their  measurement  is  a  subject  for  further  research. 

B.  ESTABLISHMENT  OF  PERFORMANCE  STANDARDS 

1 .  Radar  Navigation  Maneuver 

The  radar  navigation  maneuver  consists  of  multiple 
"legs"  of  varying  distances  between  predesignated  turn  points 
or  targets  that  are  selected  using  radar  significant  criteria. 
Each  turn  point  (TP)  must  be  reached  within  a  criterion  dis¬ 
tance  at  a  predetermined  time,  airspeed,  heading,  and  altitude 
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for  the  success  of  the  air  interdiction  mission.  Some  TPs 
are  more  difficult  than  others  to  detect,  identify  and  suc¬ 
cessfully  fly  over  using  the  A-6E  CAINS  navigation  system. 
Operationally,  a  TP  is  "crossed"  or  "reached"  when  the  actual 
position  of  the  TP  passes  more  than  ninety  degrees  abeam  the 
actual  aircraft  position.  This  operational  definition  is 
independent  of  the  DVRI  moving  bug  cue  used  by  the  B/N,  as 
based  on  cursor  intersection  on  the  perceived  TP  radar  return. 
This  operational  definition  of  TP  "passage"  can  easily  be  con¬ 
verted  mathematically  using  Boolean  functions  for  use  in  the 
A-6E  WST  performance  measurement  system. 

Appropriate  radar  navigation  routes  can  be  planned  by 
either  FRS  or  Medium  Attack  Wing  (MATWING)  personnel  and  pro¬ 
grammed  into  the  WST.  This  task  is  simplified  by  the  current 
capability  of  the  simulator  to  facilitate  preplanned  mission 
routes  for  training  purposes. 

2 .  Simulator  "Intruder  Derby" 

A  competitive  exercise  is  currently  conducted  on  an 
annual  basis  using  actual  A-6E  TRAM  or  CAINS  aircraft  that 
perform  radar  navigation  maneuvers  in  both  East  and  West  Coast 
A- 6  communities.  A  similar  competition  could  be  applied  in 
the  A-6E  WST  using  the  programmed  radar  navigation  routes  as 
previously  discussed.  Each  route  is  then  flown  by  fleet- 
experienced  aircrew  on  a  competitive  basis  under  the  cognizance 
of  the  appropriate  MATWING  command  with  performance  measured 
by  the  proposed  model.  Using  fleet  aircrew  that  are  carefully 
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selected  by  individual  squadron  Commanding  Officers  almost 
guarantees  motivated  and  skilled  performance  during  the  radar 
navigation  routes  due  to  the  intrinsic  importance  placed  upon 
the  competition  results  as  an  aid  in  determining  that  A-6 
squadron  which  is  the  most  "excellent"  in  each  MATWING  com¬ 
munity. 

The  performance  results  of  the  simulator  "Intruder 
Derby"  would  be  most  useful  for  establishing  standards  of 
performance  for  each  radar  navigation  route.  Establishing 
standards  in  this  manner  for  comparisons  of  performance  to 
other  groups  is  both  feasible  and  operationally  acceptable. 

The  use  of  "ideal"  flight  path  performance  criteria  lacks  this 
acceptance  criteria  among  operational  FRS  and  fleet  aviators, 
as  "ideal"  performance  may  not  be  achieved  by  even  the  most 
highly  skilled  aviator  in  a  consistent  manner.  The  fleet- 
established  standards  of  performance  would  be  carefully  anal¬ 
yzed  and  performance  limits  set  by  operational  personnel  for 
those  performance  dimensions  which  are  within  or  approaching 
the  performance  standards  of  the  fleet. 

C.  MEASURE  SELECTION  TECHNIQUES 

Various  empirical  methods  and  models  have  been  formulated 
and  applied  toward  reducing  a  list  of  candidate  measures  to  a 
small,  efficient  set  with  the  characteristics  of  reliability, 
validity,  objectivity  and  timeliness.  One  technique,  employed 
successfully  by  the  Air  Force  for  air  combat  maneuvering  (ACM) 
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performance  measurement,  used  univariate  and  multivariate 
analysis  techniques  to  find  the  smallest  comprehensive  set 
of  measures  which  discriminated  skill  differences  in  "novice" 
and  "expert"  ACM  pilots  in  one-versus-one  free  engagements 
[Kelly,  et  al.,  1979].  Using  multivariate  analysis,  corre¬ 
lational  analysis,  regression  analysis,  and  ridge  adjusted 
discriminant  analysis,  an  original  set  of  twenty-seven  can¬ 
didate  performance  measures  were  reduced  to  a  final  set  of 
sixteen  measures  that  were: 

(1)  Sensitive  to  differences  in  pilot  ACM  skill  level. 

(2)  Diagnostic  of  performance  proficiencies  and 
deficiencies . 

(3)  Usable  by  instructor  pilots  and  compatible  with 
their  judgements. 

(4)  Capable  of  providing  results  immediately  after 
the  end  of  the  engagement. 

(5)  Compatible  with  current  projected  training  and 
measurement  hardware. 

These  statistical  analysis  techniques  appear  to  be  appropriate 
for  application  to  the  measurement  of  B/N  performance  during 
radar  navigation  in  the  A-6E  WST.  Computer  programs  have  been 
developed  and  are  available  at  minimum  cost  for  possible  soft¬ 
ware  alterations  to  the  current  performance  system  in  the  WST. 

D.  EVALUATION  METHODS 

Once  a  small  and  efficient  set  of  performance  measures 
has  been  derived  using  suitable  statistical  techniques,  meas¬ 
urement  and  evaluation  of  student  performance  may  then  occur. 
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The  proposed  performance  measurement  model  uses  the  results 
of  a  student's  performance  compared  to  fleet  performance  as 
usable  information  for  several  decision  levels.  The  task  of 
operating  the  radar  by  the  student  on  each  leg  of  the  radar 
navigation  maneuver  is  measured  and  evaluated  in  terms  of  the 
underlying  skill  level  of  the  student,  leading  to  a  decision 
of  "proficient"  or  "not  proficient"  for  that  task.  Decisions 
on  quality  of  performance  must  also  be  made  for  global  indices 
of  navigational  skill,  e.g.,  fuel  management  or  time-on-TP 
management.  Students  just  learning  the  task  are  expected  to 
be  "not  proficient"  when  compared  to  fleet  performance  on  the 
same  mission  whereas  students  near  the  end  of  scheduled  train¬ 
ing  are  expected  to  meet  fleet  standards. 

An  evaluation  technique  proposed  by  Rankin  and  McDaniel 
[1980]  that  uses  a  sequential  method  of  making  statistical 
decisions  has  the  capability  of  utilizing  both  objective  and 
subjective  performance  results  for  more  accurate  and  poten¬ 
tially  less  costly  training  evaluation.  This  decision  model 
focuses  on  proportions  of  "proficient"  trials,  where  "profi¬ 
cient"  is  determined  by  the  instructor  using  either  subjective 
evaluation  or  objective  standards  established  prior  to  perfor¬ 
mance.  The  model  sequentially  samples  performance  during  the 
training  of  a  particular  task  or  maneuver  and  uses  the  histor¬ 
ically  sampled  performance  results  to  eventually  terminate 
training  for  that  particular  task  or  maneuver. 
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Figure  7  illustrates  the  proposed  sequential  sampling 
decision  model  using  the  task  of  navigating  the  A-6E  aircraft 
to  a  radar  navigation  TP  as  an  example.  For  this  particular 
radar  navigation  route,  eight  TPs  must  each  be  navigated  to 
within  an  empirically  established  criterion  radial  distance 
by  the  student.  The  instructor  must  use  the  performance 
measurement  results  to  evaluate  the  student's  actual  perfor¬ 
mance  on  each  TP  and  assign  a  "proficient"  (P)  or  "not  pro¬ 
ficient"  (1)  score  for  each  TP,  where  each  TP  is  considered 
a  trial.  The  figure  shows  "proficient"  (P)  trials  plotted 
against  total  trials  and  indicates  in  this  example  that  three 
TPs  were  successively  and  accurately  navigated,  followed  by  a 
missed  fourth  TP  and  ending  with  the  remaining  four  TPs  suc¬ 
cessfully  navigated.  The  regions  of  "proficient,"  "undeter¬ 
mined,"  and  "not  proficient"  are  derived  statistically;  more 
detail  on  their  actual  calculation  is  presented  in  Appendix 
E.  As  can  be  shown  in  the  figure,  the  student  has  "mastered" 
the  important  task  of  navigating  to  a  TP  on  his  eighth  trial . 
This  information  can  then  be  used  by  the  training  manager  in 
deciding  whether  the  student  has  actually  mastered  the  task 
and  needs  to  progress  to  more  difficult  tasks  or  the  student 
has  not  mastered  the  task  in  a  previously  established, 
statistically-based  number  of  training  trials  and  needs  re¬ 
medial  training  for  that  task.  For  this  example,  the  training 
manager  could  safely  determine  that  the  student  is  ready  for 
training  on  tasks  other  than  accurately  navigating  to  a  TP. 
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Figure  7.  Sequential  Sampling  Decision  Model  for  TP  Navigation  Task. 


This  sequential  sampling  decision  model  has  been  previ¬ 
ously  used  in  educational  and  training  settings.  Ferguson 
[1969]  used  the  sequential  test  to  determine  whether  indi¬ 
vidual  students  should  be  advanced  or  given  remedial  assistance 
after  they  completed  instructional  learning  modules,  and 
Kalisch  [1980]  employed  the  model  for  an  Air  Force  Weapons 
Mechanics  Training  Course  (63ABR46320)  conducted  at  Lowry  Air 
Force  Base,  Colorado.  Both  applications  resulted  in  greater 
test  efficiency  than  for  tests  composed  of  a  fixed  number  of 
items  and  substantially  reduced  testing  time. 

As  discussed  in  Chapter  IV  (Table  V) ,  the  resulting  costs 
associated  with  training  manager  decisional  errors  predicates 
that  statistical  and  systematic  methods  be  employed  to  measure 
and  evaluate  student  performance.  The  sequential  sampling  plan 
accomplishes  this  by  fixing  the  error  rates  (Types  I  and  II 
as  previously  discussed  in  Table  V)  and  allowing  the  number 
of  trials  to  vary  according  to  the  performance  demonstrated 
by  the  student.  This  evaluation  decision  model  is  currently 
being  integrated  into  the  training  program  of  a  helicopter  FRS 
that  uses  only  subjective  determinations  of  "proficient"  in¬ 
structors  [Rankin  and  McDaniel,  1980] . 

E.  INFORMATION  DISPLAYS 

This  section  discusses  some  displays  proposed  for  use  by 
the  instructor  at  the  A-6E  WST  console  for  the  purpose  of  per¬ 
formance  evaluation.  Several  classes  of  performance  measures 
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are  represented.  Figure  8  shows  a  time  activity  record  for 
the  B/N's  interactive  control  of  the  A-6E  CAINS  navigation 
system  during  one  leg  of  a  radar  navigation  maneuver.  This 
type  of  display  tells  what,  when,  and  for  how  long  a  particular 
equipment  was  being  operated.  From  this,  one  may  infer  what 
particular  task  was  being  accomplished  at  a  particular  time 
during  a  radar  navigation  leg.  Time  activity  records  of  fleet 
performance  may  be  used  as  a  performance  standard  by  simply 
preparing  a  transparent  overlay  to  show  means  and  ranges  of 
activity  by  the  fleet  performing  the  same  radar  navigation 
leg.  This  comparison  provides  diagnostic  information  for  the 
individual  student  in  regards  to  efficient  or  appropriate 
operation  of  the  complex  navigation  system  of  the  A-6E. 

Specific  tasks  may  be  measured  directly  and  displayed  as 
shown  in  Figures  9  and  10.  The  tasks  of  time  and  fuel  manage¬ 
ment  are  measured  in  the  simulator  by  comparing  planned  time 
and  fuel  values  with  actual  time  and  fuel  values  at  each  turn 
point,  based  on  the  premise  of  facilitating  the  input  of  the 
student-planned  values  by  the  instructor  into  the  simulator 
computer  prior  to  the  simulated  mission.  Trends  may  show 
decisional  errors  not  otherwise  detectable  by  an  instructor. 
Again,  fleet  performance  standards  can  be  compared  by  using 
simple  overlays  to  an  individual  student's  performance,  as 
well  as  the  performance  of  other  students,  to  facilitate 
evaluation. 
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Figure  8.  Time  Activity  Record  Display. 
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Figure  9.  Time-on-TP  Management  Display. 
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Figure  10. 


Fuel  Management  Display. 


Figure  11  shows  the  usefulness  of  displaying  the  results 
of  an  overall  measure  of  performance:  radar  navigation  accur¬ 
acy.  The  solid  boundary  lines  between  and  surrounding  each 
TP  are  performance  standards  established  by  fleet  A-6E  CAINS 
aircrew;  in  this  situation  a  90  percent  confidence  interval 
has  been  constructed  about  the  mean  flight  path.  Student 
navigational  accuracy  over  the  entire  mission  may  be  evaluated 
from  this  display  as  well  as  diagnostic  navigational  informa¬ 
tion  for  each  leg  or  TP.  This  figure  illustrates  the  perfor¬ 
mance  of  a  student  who  has  met  fleet-established  criterion 
limits  for  all  leg  and  TP  navigation  portions  of  the  route 
except  TP  number  two . 

Summarized  performance  of  the  entire  mission  is  depicted 
by  Table  XII.  Each  turn  point  (TP)  is  evaluated  for  B/N 
equipment  time-sharing  activity  on  the  previous  leg,  time-on- 
TP  management,  fuel  management,  heading  accuracy  (as  related 
to  planned  run-in  heading) ,  navigational  accuracy  (reported 
as  a  "P"  if  within  criterion  limits  or  reported  in  miles  from 
TP  if  not  within  limits) ,  minimum  altitude  for  the  previous 
leg,  and  indicated  airspeed  (IAS)  at  the  TP. 

This  brief  discussion  on  information  displays  for  B/N 
performance  is  not  exhaustive  and  does  not  reflect  what  may 
be  the  most  efficient  and  reliable  measures  for  determining 
skill  acquisition.  Only  statistical  methods  will  produce 
those  measures  that  should  be  displayed  and  used.  The  examples 
presented  here  are  for  illustrative  purposes  only. 
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Figure  11.  Radar  Navigation  Accuracy  Display. 


TABLE  XII:  RADAR  NAVIGATION  PERFORMANCE  SUMMARY 
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F.  IMPLEMENTING  THE  MODEL 

A  model  for  measuring  B/N  performance  during  radar  navi¬ 
gation  in  the  A-6E  WST  has  been  designed  and  proposed  for  use 
by  the  East  and  West  Coast  A- 6  Fleet  Replacement  Squadrons. 
The  final  model  design  was  predicated  on:  (1)  implementing 
the  model  at  minimum  cost,  (2)  utilizing  existing  computer 
algorithms  and  software  that  have  been  validated,  and  (3) 
requiring  no  additional  personnel  to  operate  the  model  after 
implementation.  Some  software  changes  are  necessary,  but 
they  appear  to  be  minor  in  light  of  the  2F114  design  specifi¬ 
cations  for  currently  accessible  programs.  Some  translation 
may  be  necessary  due  to  different  computer  languages  but 
these  are  feasible  alternatives  given  the  implications  for 
reducing  training  costs  and  increasing  the  effectiveness  of 
both  the  instructor  and  the  simulator.  Additionally,  a  com¬ 
puter-managed  system  will  be  necessary  for  implementing  the 
sequential  sampling  decision  model.  Currently  available 
desk-top  computers  could  accomplish  this  function  assuming 
the  simulator's  computer  capacity  was  fully  utilized  after 
the  measurement  portion  of  the  model  was  installed. 

Objective  performance  measurement  provides  useful  infor¬ 
mation  necessary  for  training  evaluation  and  control.  Per¬ 
formance  measurement  models  that  incorporate  this  powerful 
technique  can  increase  simulator  and  instructor  effective¬ 
ness,  reduce  training  costs,  and  may  contribute  toward 
reducing  accidents  attributed  to  "unskilled"  aviators. 
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Implementation  of  these  systems  appears  to  be  cost-effective 
in  view  of  the  potential  savings  from  their  effective  utili¬ 
zation. 
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VIII.  SUMMARY 


The  purpose  of  this  thesis  was  to  model  a  performance 
measurement  system  for  the  Bombardier/Navigator  (B/N)  Fleet 
Replacement  Squadron  (FRS)  student  during  the  radar  navigation 
maneuver  in  the  A-6E  Weapon  System  Trainer  (WST,  device  2F114) 
that  would  best  determine  student  skill  acquisition  and  would 
incorporate  the  advantages  of  both  objective  and  subjective 
aircrew  performance  measurement  methods.  This  chapter  is  pro¬ 
vided  as  a  compendium  due  to  the  extensive  material  covered 
and  assumes  reader  unfamiliarity  of  previous  chapters. 

A.  STATEMENT  OF  PROBLEM 

Traditional  and  current  FRS  student  performance  measure¬ 
ment  and  assessment  in  the  A-6E  WST  by  an  instructor  is  mostly 
subjective  in  nature  with  disadvantages  of  low  reliability, 
lack  of  established  performance  standards,  and  human  perceptual 
measurement  inadequacies.  The  recently  delivered  A-6E  WST  has 
the  capability  to  objectively  measure  student  performance  but 
is  not  being  utilized  in  this  fashion  due  to  the  lack  of  an 
operational  performance  measurement  system  that  incorporates 
the  characteristics  of  objective  performance  measurement  and 
still  retains  the  valuable  judgement  and  experience  of  the 
instructor  as  a  measuring  and  evaluating  system  component. 
Objectivity  in  performance  measurement  is  a  highly  desirable 
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component  of  the  performance  measurement  and  evaluation  pro¬ 
cess  that  enables  the  establishment  of  performance  standards, 
increases  instructor  and  simulator  effectiveness,  and  fulfills 
the  requirements  of  Department  of  Defense  policy. 

B.  APPROACH  TO  PROBLEM 

Designing  a  model  to  measure  B/N  performance  during  radar 
navigation  in  the  A-6E  WST  necessarily  assumed:  (1)  the  A-6E 
WST  realistically  duplicated  the  A-6E  aircraft  in  both  engi¬ 
neering  and  mission  aspects,  (2)  little  variability  in  overall 
A-6E  crew-system  performance  is  attributable  to  the  pilot,  (3) 
that  results  from  pilot  performance  measurement  literature 
were  applicable  to  the  B/N,  (4)  that  a  mathematical  relation¬ 
ship  existed  between  some  aspects  of  B/N  behavior  and  perfor¬ 
mance  measurement  and  evaluation,  and  (5)  competitively  selected, 
motivated  and  experienced  A-6E  fleet  aircrew  exhibit  advanced 
skill  or  "proficiency"  characterized  by  minimum  effort  and  con¬ 
sistent  responses  ordinarily  found  in  actual  aircraft  flight. 

The  methodology  used  in  formulating  a  model  to  measure  B/N 
performance  was  based  on  an  extensive  literature  review  of 
aircrew  performance  measurement  from  1962-1980  and  an  analy¬ 
tical  task  analysis  of  the  B/N's  duties.  After  selection  of 
the  Air  Interdiction  scenario  and  turn  point-to-turn  point 
radar  navigation  flight  segment,  the  review  concentrated  on 
aircrew  performance  measurement  research  which  emphasized 
navigation,  training,  and  skill  acquisition.  A  brief  review 
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was  presented  of  the  concepts  of  performance  measurement  and 
evaluation  including  measure  types,  reliability,  validity, 
measure  selection  criteria,  performance  standards,  aviation 
measures  of  effectiveness  and  types  of  evaluation  within  the 
framework  of  aircrew  training.  A  model  was  then  formulated 
to  illustrate  the  relationship  among  student  B/N  skill  acqui¬ 
sition,  the  radar  navigation  task,  and  performance  measurement 
and  evaluation.  Candidate  measures  for  navigation  training 
and  the  radar  navigation  flight  segment  were  identified  from 
an  original  listing  of  182  performance  measures  from  previous 
aircrew  performance  measurement  research.  The  task  analysis 
was  performed  to  identify  skills  and  knowledge  required  of  the 
B/N  and  to  identify  candidate  performance  measures  for  both 
B/N  skill  acquisition  and  the  radar  navigation  segment.  A 
Mission  Time  Line  Analysis  (MTLA)  was  conducted  to  identify 
B/N  tasks  critical  to  performance.  A  model  was  then  formulated 
to  illustrate  A-6E  crew-system  interaction  and  the  complexity 
involved  in  measuring  B/N  performance.  Generic  aircrew  per¬ 
formance  measurement  system  concepts  were  reviewed  for  the 
training  environment.  Current  performance  measurement  and 
evaluation  practices  of  the  A- 6  FRS  for  the  B/N  student  in 
the  WST  were  reviewed  as  well  as  the  current  objective  per¬ 
formance  capabilities  of  the  WST.  A  final  list  of  candidate 
measures  was  presented  that  had  met  selection  criteria  of  face 
validity,  ease  of  use,  instructor  and  student  acceptance,  and 
appropriateness  to  training. 
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C.  FORMULATION  OF  THE  MODEL 

The  purpose  of  measuring  B/N  performance  during  radar 
navigation  in  the  WST  was  to  provide  objective,  reliable,  and 
valid  information  for  accurate  decision-making  about  B/N  skill 
acquisition  to  the  FRS  instructor  and  training  manager.  The 
model  developed  candidate  measures  (Table  XI)  that  determined 
what  to  measure,  observation  method,  when  to  measure,  scaling, 
sampling  frequency,  criteria  establishment,  applicable  trans¬ 
formations,  current  availability  in  the  A-6E  WST,  and  access¬ 
ibility  of  the  measure  if  not  currently  available.  After 
operationally  defining  the  radar  navigation  segment,  a  pro¬ 
posal  was  made  to  conduct  an  annual  competitive  exercise  in 
the  A-6E  WST  under  the  cognizance  of  the  appropriate  Medium 
Attack  Wing  (MATWING)  command  utilizing  A-6E  fleet  squadron 
aircrew.  A-6E  fleet  squadron  personnel  would  fly  preprogrammed 
radar  navigation  routes  while  their  performance  was  measured 
using  the  candidate  measures  previously  developed.  The  results 
from  the  proposed  competitive  exercise  were  cited  as  being 
useful  and  operationally  acceptable  for  establishing  standards 
of  performance  for  each  radar  navigation  route  since  the  se¬ 
lected  aircrew  would  be  highly  motivated  and  fleet-experienced. 
Statistical  techniques  for  reducing  the  initial  candidate 
measures  for  B/N  radar  navigation  performance  were  reviewed 
and  evaluated  with  respect  to  skill  acquisition, and  a  multi¬ 
variate  discriminant  analysis  model  was  selected  as  applicable 
due  to  the  model's  utility  and  previous  practical  development 
and  applications. 
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The  final  part  of  the  performance  measurement  model 
proposed  an  evaluation  application  which  used  dichotomous 
results  of  student  performance  compared  to  fleet  performance 
as  information  for  several  decision  levels.  Performance 
results  of  the  student  for  the  majority  of  the  statistically 
reduced  performance  measures  can  be  dichotomized  by  the  in¬ 
structor  as  either  "proficient"  (skilled)  or  "not  proficient" 
(unskilled)  based  on  the  empirically  established  objective 
performance  standards  or  operationally-defined  subjective 
performance  standards.  An  evaluation  model  developed  by  Rankin 
and  McDaniel  [1980]  that  used  a  sequential  method  of  making 
statistical  decisions  incorporating  the  dichotomized  results 
of  student  performance  was  adapted  and  modified  for  determining 
successful  completion  of  task  training  for  the  B/N  based  on 
both  objective  and  subjective  performance  measurement  and  eval¬ 
uation.  The  sequential  sampling  model  was  selected  due  to  its 
inherent  power  to  fix  decisional  error  rates,  previous  practical 
developments,  and  potential  for  reducing  training  costs.  Sev¬ 
eral  informational  displays  specific  to  the  B/N  radar  navigation 
segment  and  based  on  hypothetical  model  results  were  presented. 
Model  implementation  was  discussed  with  regards  to  personnel, 
costs,  A-6E  WST  software  changes,  and  effective  training  control. 

D.  IMPACT  TO  THE  FLEET 

The  performance  measurement  model  as  outlined  in  this  thesis 
is  specific  to  measuring  A-6E  3/N  performance  during  radar 
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navigation  but  has  generic  qualities  applicable  to  aircrew 
members  of  any  aircraft.  The  advantages  of  objective  meas¬ 
urement  in  the  form  of  reduced  paperwork,  permanent  perfor¬ 
mance  records,  established  performance  standards,  diagnostic 
and  timely  information,  and  high  reliability  are  fulfilled. 

At  the  same  time,  subjective  measurements  for  those  aircrew 
behavioral  aspects  that  currently  defy  objective  measurement 
are  made  using  the  experienced  simulator  mission  instructor, 
who  also  remains  as  the  final  decision-maker  on  whether  or  not 
a  student  has  demonstrated  task  performance  that  reflects  an 
acquired  skill  for  that  task. 

The  application  of  the  model  to  individual  aircrew  readi¬ 
ness,  fleet  squadron  unit  readiness,  and  selection  of  individual 
aircrew  teams  for  multi-crew  aircraft  appears  to  be  feasible 
and  operationally  acceptable.  The  model  has  potential  utility 
in  tactics  development,  accident  prevention,  predictive  per¬ 
formance,  and  proficiency  training  of  reserve  aviators.  Almost 
certainly  some  reduction  in  aircrew  training  costs  and  an 
increase  in  instructor,  simulator,  and  training  program  effec¬ 
tiveness  would  be  realized. 
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APPENDIX  A 


A-6E  TRAM  RADAR  NAVIGATION  TASK  LISTING 


The  enclosed  task  listing  for  the  B/N  during  radar  navi¬ 
gation  in  the  A-6E  WST  was  compiled  from  various  sources  as 
discussed  in  Chapter  V.  This  was  the  first  phase  of  develop¬ 
ing  a  task  analysis  for  the  purpose  of  measuring  performance 
of  the  B/N  during  radar  navigation. 
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A-6E  TRAM  RADAR  NAVIGATION  TASK  LISTING 


SEGMENT  1:  AFTER  TAKEOFF  CHECKS 

Tl  ADJUST  RADAR  PANEL  FOR  SCOPE  RETURN 

51  SET  SEARCH  RADAR  PWR  SWITCH . ON 

52  SET  XMT  SWITCH . NORM 

T2  ACTIVATE  SYSTEM  STEERING  TO  INITIAL  POINT  (IP) 

51  CHECK  COMPTMODE  SWITCH . STEER 

52  DEPRESS  TGT  N  ADDRESS  KEY  HAVING  IP 
LAT/LONG 

53  CHECK  COMPT/MAN  SWITCH . COMPT 

T3  CHECK  FOR  ACCURATE  SYSTEM  STEERING  TO  IP 


51  READ  SYSTEM  BEARING  AND  RANGE  TO  IP  FROM 
DVRI  BUG  AND  RANGE  DISPLAYS 

52  COMPARE  SYSTEM  BEARING  AND  RANGE  TO  IP 
WITH  PRE-PLANNED  OR  ESTIMATED  BEARING 
AND  RANGE  TO  IP 

53  GO  TO  T5  IF  SYSTEM  STEERING  TO  IP  IS 
CORRECT 

54  GO  TO  T4  IF  SYSTEM  STEERING  TO  IP  IS 
NOT  CORRECT 


T4  TROUBLESHOOT  SYSTEM  STEERING  IF  REQUIRED 

51  DETERMINE  IP  LAT/LONG  FROM  CHART  OR 
IFR  SUPPLEMENT 

52  COMPARE  SYSTEM  IP  LAT/LONG  WITH  ACTUAL 
IP  LAT/LONG 

(a)  THROW  DDU  DATA  SWITCH . ON  CALL 

(b)  READ  SYSTEM  TGT  N  ADDRESS  (IP) 

FROM  LOWER  DDU  LAT/LONG  DISPLAYS 
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53  INSERT  CORRECT  IP  LAT/LONG  IF  REQUIRED 

54  EVALUATE  SYSTEM  PRESENT  POSITION  AND 
ESTIMATED  POSITION  FROM  CHART 

(a)  THROW  DDU  DATA  SWITCH . PRES  POS 

(b)  READ  SYSTEM  PRESENT  POSITION 
LAT/LONG  FROM  LOWER  DDU  LAT/LONG 
DISPLAYS 

(c)  COMPARE  SYSTEM  PRESENT  POSITION 
WITH  ESTIMATED  PRESENT  POSITION 
FROM  CHART 

(d)  INSERT  CORRECT  PRESENT  POSITION 
IF  REQUIRED 

(1)  DEPRESS  PRES  LOC  ADDRESS  KEY 


ON  COMPUTER  KEYBOARD 

(2)  THROW  COMPTMODE  SWITCH . ENTER 

(3)  INSERT  CORRECT  PRESENT 
POSITION  LAT/LONG 

(4)  THROW  COMPTMODE  SWITCH . STEER 


(5)  CHECK  LOWER  DDU  LAT/LONG  DISPLAYS 
FOR  ACCURATE  DATA  ENTRY 

S5  INFORM  PILOT  OF  APPROXIMATE  HEADING  TO  IP 
IF  REQUIRED  AND  GO  TO  T6 


T5  INFORM  PILOT  THAT  SYSTEM  STEERING  IS  TO  IP 

SI  CHECK  THAT  PILOT  MAINTAINS  SAFE  FLIGHT 
AND  FOLLOWS  SYSTEM  STEERING 


T6  CHECKOUT  RADAR  FOR  STATUS 

51  GO  TO  NEXT  EVENT  IF  RADAR  RETURN  IS 
PRESENT 

52  GO  TO  T7  IF  RADAR  RETURN  IS  NOT  PRESENT 


T7  TROUBLESHOOT  RADAR  IF  REQUIRED 

SI  CHECK  TEST  MODE  SWITCH  (RADAR/DRS 

TEST) .  CENTERED 
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52  CHECK  FAULT  ISLN  SWITCH . CENTERED 

53  CHECK  RADAR  CIRCUIT  BREAKER . IN 


54  USE  PCL  CHECKLIST  FOR  RADAR  TURN-UP 
PROCEDURES 

55  ALERT  PILOT  IF  RADAR  INOPERATIVE  AND 
ABORT  RADAR  NAVIGATION  FLIGHT 


SEGMENT  2:  NAVIGATION  TO  IP 

Tl  TUNE  RADAR  FOR  OPTIMUM  PPI  DISPLAY 

51  ROTATE  CONTRAST  CONTROL . CW 

52  ROTATE  BRT  CONTROL  (UNTIL  SWEEP  PRESENT) . CW 

53  CHECK  VIDEO/DIF  CONTROLS . CCW 

54  ROTATE  RCVR  CONTROL 

(UNTIL  RETURN  IS  PRESENT) . CW 

55  CHECK  DISPLAYS  BUTTONS . PPI 

56  ADJUST  PPI  RANGE  CONTROL 

(UNTIL  IP  AT  TOP  OF  SCOPE) . CW/CCW 

57  ROTATE  RNG  MKR/AZ  MKR  CONTROLS 

(UNTIL  CURSORS  PRESENT) . CW 

58  CHECK  SCAN  STAB  CONTROL . ADL 

59  ROTATE  SCAN  ANGLE  CONTROL 

(UNTIL  DESIRED  SWEEP  WIDTH  PRESENT) . CW/CCW 

510  CHECK  SCAN  RATE  SWITCH . FAST 

511  CHECK  AMT  I  CONTROL . CCW 

51 2  CHECK  STC  SLOPE/DEPTH  CONTROLS . CW/CCW 

513  THROW  ANT  PATT  SWITCH 

(CONTINUE  FOR  10  SECOND  MINIMUM) . FAR 

514  CHECK  RCVR  SWITCH . AFC 

51 5  CHECK  BEACON  CONTROL . CCW 

516  CHECK  AZ-RNG  TRKG  SWITCH . OFF 
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S17  CHECK  FREQ  AGILITY  SWITCH 


ON 


T2  POSITION  CURSOR  INTERSECTION  ON  IP  IF  REQUIRED 

51  PLACE  LEFT  HAND  ON  SLEW  CONTROL  STICK 

52  DEPRESS  RADAR  SLEW  BUTTON  AND  PUSH  SLEW 
CONTROL  STICK  IN  DIRECTION  OF  DESIRED 
CURSOR  INTERSECTION  MOVEMENT 

53  COMPARE  RADAR  RETURN  IMAGE  ON  CURSOR 
INTERSECTION  WITH  PREPLANNED  CHART 
AND  QUICK  &  DIRTY  IP 

54  REPEAT  SI  AND  S2  IF  REQUIRED 

55  PUSH  CORRECT  POS  BUTTON  ON  LOWER  DDU  PANEL 


T3  ACTIVATE  DOPPLER  RADAR 

51  SET  DOPPLER  CONTROL  SWITCH 

(LAND  OR  SEA  AS  APPROPRIATE) . 

52  MONITOR  DOPPLER  CONTROL  PANEL  FOR 
DOPPLER  RADAR  STATUS 

(a)  OBSERVE  MEMORY  LIGHT  OUT  FOR  PROPER 
OPERATION 

(b)  OBSERVE  DRIFT  AND  GND  SPEED  DISPLAYS 
FOR  DRIFT  ANGLE  AND  GROUND  SPEED 
PRESENT 


T4  SELECT  SYSTEM  STEERING  OR  DEAD  RECKONING 
(DR)  NAVIGATION 

51  DETERMINE  STATUS  OF  INS,  RADAR,  AND 
DOPPLER  RADAR 

52  OBSERVE  CURSOR  INTERSECTION  DRIFT  ON 
RADAR  SCOPE 

53  GO  TO  T5  IF  DEAD  RECKONING  IS  SELECTED 
(LARGE  CURSOR  INTERSECTION  DRIFT)  OR  SEG¬ 
MENT  3,  T1  IF  SYSTEM  NAVIGATION  IS 
SELECTED  (SMALL  CURSOR  INTERSECTION  DRIFT) 

T5  PERFORM  DEAD  RECKONING  NAVIGATION 
(.BEYOND  THE  SCOPE  OF  THIS  REPORT) 


ON 
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SEGMENT  3:  NAVIGATION  TO  TURN  POINT  (TP) 

Tl  INITIATE  TURN  AT  IP 

51  ALERT  PILOT  OF  NEXT  OUTBOUND  HEADING 
ONE  MINUTE  PRIOR  TO  REACHING  IP 

52  SET  OUTBOUND  HEADING  ON  HORIZONTAL 
SITUATION  INDICATOR  (HSI)  BY  ROTATING 
HEADING  SELECT  KNOB  UNTIL  THE  HEADING 
SELECT  MARKER  IS  ON  THE  DESIRED  HEADING 

53  CHECK  AZ-RNG  TRKG  SWITCH . 

54  MONITOR  DVR I  HEADING  BUG  FOR  MOVEMENT 
TO  180°  RELATIVE  POSITION  (IP  PASSAGE) 

55  ACTIVATE  COCKPIT  CLOCK  AT  IP  PASSAGE 

56  INFORM  PILOT  OF  IP  PASSAGE  AND  OUTBOUND 
HEADING 

57  CHECK  FOR  PILOT  TURNING  TO  NEW  HEADING 


T2  ACTIVATE  SYSTEM  STEERING  TO  TP 

51  DEPRESS  TGT  N  ADDRESS  KEY  HAVING  NEXT 
TP  LAT/LONG 

52  CHECK  COMPTMODE  SWITCH . 


T3  CHECK  FOR  ACCURATE  SYSTEM  STEERING  TO  TP 

51  READ  SYSTEM  BEARING  AND  RANGE  TO  TP 
FROM  DVRI  BUG  AND  RANGE  DISPLAYS 

52  COMPARE  SYSTEM  BEARING  AND  RANGE  TO  TP 
WITH  PREPLANNED  OR  ESTIMATED  BEARING 
AND  RANGE  TO  TP 

53  GO  TO  T5  IF  SYSTEM  STEERING  TO  TP  IS 
CORRECT 

54  GO  TO  T4  IF  SYSTEM  STEERING  TO  TP  IS 
NOT  CORRECT 

T4  TROUBLESHOOT  SYSTEM  STEERING  IF  REQUIRED 

SI  DETERMINE  TP  LAT/LONG  FROM  CHART  OR 
IFR  SUPPLEMENT 


OFF 


STEER 
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52  COMPARE  SYSTEM  TP  LAT/LONG  WITH 
ACTUAL  TP  LAT/LONG 

(a)  THROW  DDU  DATA  SWITCH . ON  CALL 

(b)  READ  SYSTEM  TGT  N  ADDRESS  (TP) 

FROM  LOWER  DDU  LAT/LONG  DISPLAYS 

53  INSERT  CORRECT  TP  LAT/LONG  IF  REQUIRED 

54  EVALUATE  SYSTEM  PRESENT  POSITION  AND 
ESTIMATED  POSITION  FROM  CHART 

(a)  THROW  DDU  DATA  SWITCH . PRES  POS 

(b)  READ  SYSTEM  PRESENT  POSITION  LAT/ 

LONG  FROM  LOWER  DDU  LAT/LONG  DISPLAYS 

(c)  COMPARE  SYSTEM  PRESENT  POSITION  WITH 
ESTIMATED  PRESENT  POSITION  FROM  CHART 

(d)  INSERT  CORRECT  PRESENT  POSITION  IF 
REQUIRED 

(1)  DEPRESS  PRES  LOC  ADDRESS  KEY 


ON  COMPUTER  KEYBOARD 

(2)  THROW  COMPTMODE  SWITCH . ENTER 

(3)  INSERT  CORRECT  PRESENT  POSITION 
LAT/LONG 

(4)  THROW  COMPTMODE  SWITCH . STEER 


(5)  CHECK  LOWER  DDU  LAT/LONG  DISPLAYS 
FOR  ACCURATE  DATA  ENTRY 

T5  INFORM  PILOT  THAT  SYSTEM  STEERING  IS  TO  THE  TP 

SI  CHECK  THAT  PILOT  MAINTAINS  SAFE  FLIGHT 
AND  FOLLOWS  SYSTEM  STEERING 

T6  INSERT  DATA  FOR  NEXT  REMAINING  TPs  IF  REQUIRED 

51  THROW  COMPTMODE  SWITCH . ENTER 

52  DEPRESS  PREVIOUSLY  UTILIZED  TGT  N 
ADDRESS  KEY 

53  DEPRESS  POS  ACTION  KEY 
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54  DEPRESS  APPROPRIATE  QUANTITY  KEYS 
IN  SEQUENCE  FOR  TP  LAT/LONG 

55  DEPRESS  ALT  ACTION  KEY 

56  DEPRESS  APPROPRIATE  QUANTITY  KEYS 
IN  SEQUENCE  FOR  TP  ALTITUDE 

57  ALERT  PILOT  TO  MAINTAIN  PRESENT  HEADING 


58  THROW  COMPTMODE  SWITCH . STEER 

59  THROW  DDU  DATA  SWITCH . ON  CALL 


510  CHECK  LOWER  DDU  LAT/LONG  DISPLAYS  FOR 
ACCURATE  DATA  ENTRY 

511  REPEAT  SI  THROUGH  S10  IF  REQUIRED 

512  DEPRESS  REQUIRED  TGT  N  ADDRESS  KEY  FOR 
CURRENT  TP  SYSTEM  STEERING 

513  CHECK  FOR  ACCURATE  SYSTEM  STEERING  TO  TP 

(a)  READ  SYSTEM  BEARING  AND  RANGE  TO  TP 
FROM  DVRI  BUG  AND  RANGE  DISPLAYS 

(b)  COMPARE  SYSTEM  BEARING  AND  RANGE  TO 
TP  WITH  PREPLANNED  OR  ESTIMATED 
BEARING  AND  RANGE  TO  TP 

514  REPEAT  S12  AND  S13  IF  REQUIRED 

515  INFORM  PILOT  THAT  SYSTEM  STEERING  IS 
TO  THE  TP 

(a)  CHECK  THAT  PILOT  MAINTAINS  SAFE 

FLIGHT  AND  FOLLOWS  SYSTEM  STEERING 

T7  perform  system  navigation  tasks 

SI  TUNE  RADAR  FOR  OPTIMUM  PPI  DISPLAY 


(a)  ADJUST  PPI  RANGE  CONTROL 

(UNTIL  TP  AT  TOP  OF  SCOPE) . CW 

(b)  ADJUST  RCVR  CONTROL 

(UNTIL  RETURN  IS  ENHANCED) . CW/CCW 

(C)  ADJUST  STC  DEPTH  CONTROL 

(UNTIL  EVEN  RETURN  PRESENT) . CW 
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(d)  ADJUST  SCAN  ANGLE  CONTROL 

(UNTIL  DESIRED  SWEEP  WIDTH  PRESENT) .. CW/CCW 

52  MONITOR  FLIGHT  PROGRESS  USING  RADAR 
SIGNIFICANT  TERRAIN/CULTURAL  FEATURES 
AS  CHECK  POINTS 

(a)  COMPARE  RATAR  RETURN  IMAGE  WITH 
PREPLANNED  CHART  AND  QUICK  &  DIRTY 

(b)  COMPARE  SYSTEM  PRESENT  POSITION  WITH 
ESTIMATED  PRESENT  POSITION  FROM  CHART 
IF  REQUIRED 

(1)  CHECK  DDU  DATA  SWITCH . PRES  POS 

(2)  READ  SYSTEM  PRESENT  POSITION 
LAT/LONG  FROM  LOWER  DDU  LAT/LONG 
DISPLAYS 

(3)  RECORD  SYSTEM  PRESENT  POSITION 
ON  CHART 

(4)  READ  TIME- INTO- LEG  OR  TOTAL  TIME 
FROM  COCKPIT  CLOCK 

(5)  RECORD  ESTIMATED  PRESENT  POSITION 
ON  CHART 

(c)  REPEAT  (a)  AND  (b)  IF  REQUIRED 

53  INFORM  PILOT  OF  SYSTEM  NAVIGATIONAL  ACCURACY 

54  POSITION  CURSOR  INTERSECTION  ON  TP  IF 
REQUIRED 

(a)  PLACE  LEFT  HAND  ON  SLEW  CONTROL  STICK 

(b)  DEPRESS  RADAR  SLEW  BUTTON  AND  PUSH  SLEW 
CONTROL  STICK  IN  DIRECTION  OF  DESIRED 
CURSOR  INTERSECTION  MOVEMENT 

(c)  COMPARE  RADAR  RETURN  IMAGE  ON  CURSOR 
INTERSECTION  WITH  PREPLANNED  CHART 
AND  QUICK  &  DIRTY  TP 

(d)  REPEAT  (a)  AND  (b)  IF  REQUIRED 

(e)  PUSH  CORRECT  POS  BUTTON  ON  LOWER 
DDU  PANEL 
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55  MONITOR  NAVIGATIONAL  EQUIPMENT  FOR 
OPERATING  STATUS  OR  EQUIPMENT  CONDITION 

(a)  CHECK  NAVIGATION  (INS)  CONTROL 
PANEL  FOR  INS  FAILURE  INDICATIONS 

(b)  CHECK  VTR  PANEL  FOR  OPERATING 
INDICATIONS  IF  REQUIRED 

(c)  CHECK  FOR  COMPUTER  ERROR  LIGHT  ON 
DVR I  PANEL  AND  LOWER  DDU  PANEL 

(d)  CHECK  ATTITUDE  REF  SWITCH . COMP  IN 

(e)  CHECK  MAGNETIC  VARIATION  FROM  MAG 
VAR  DISPLAY  ON  LOWER  DDU  PANEL 

(1)  COMPARE  TO  CHART  MAGNETIC 

VARIATION  AND  SET  IF  NECESSARY 

(f)  INFORM  PILOT  OF  NAVIGATIONAL 
EQUIPMENT  STATUS 

56  MONITOR  TIME  ON  TP 

(a)  COMPARE  RECORDED  ACTUAL  LEG  OR  TOTAL 
TIME  FOR  PREVIOUS  TP  WITH  PREPLANNED 
LEG  OR  TOTAL  TIME  FOR  PREVIOUS  TP 

(b)  INSTRUCT  PILOT  TO  ADJUST  THROTTLE 
CONTROLS  SO  AS  TO  CORRECT  TIME  ON  TP 

(c)  INFORM  PILOT  OF  TIME  ON  TP  RESULTS 
SI  MONITOR  SYSTEM  VELOCITIES 

(a)  READ  SYSTEM  GROUND  SPEED  (GS)  AND 
WIND 

(1)  THROW  DDU  DATA  SWITCH . DATA 

(2)  RECORD  GROUND  SPEED  IN  G/S 
DISPLAY 

(3)  RECORD  WIND  SPEED  AND  DIRECTION 
IN  WIND  SPEED/WIND  DIR  DISPLAYS 

(b)  READ  AND  RECORD  GS  FROM  DOPPLER  PANEL 
GND  SPEED  DISPLAY 

(c)  READ  AND  RECORD  INDICATED  AIRSPEED 
(IAS)  FROM  AIRSPEED  INDICATOR  ON 
PILOT'S  INSTRUMENT  PANEL 
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(d)  READ  SYSTEM  TRUE  AIRSPEED  (TAS) 

(1)  THROW  DDU  DATA  SELECT  SWITCH . A 

(2)  RECORD  TAS  FROM  DISPLAY  1 

(e)  EVALUATE  SYSTEM  VELOCITIES  USING  GS , 

WIND,  IAS,  AND  TAS 

(f)  TROUBLESHOOT  SYSTEM  VELOCITIES  IF 
REQUIRED 

(g)  INFORM  PILOT  OF  SYSTEM  VELOCITY  RESULTS 

58  MONITOR  SYSTEM  HEADING 

(a)  READ  TRUE  HEADING  FROM  DVR I  DISPLAY  BUG 

(b)  READ  MAGNETIC  HEADING  FROM  WET  COMPASS 

(c)  READ  HEADING  FROM  HORIZONTAL  SITUATION 
INDICATOR 

(d)  READ  MAGNETIC  VARIATION  FROM  MAG  VAR 
DISPLAY  ON  LOWER  DDU  PANEL 

(e)  EVALUATE  HEADING  ACCURACY  USING  TRUE 
HEADING,  MAGNETIC  VARIATION,  AND  COMPASS 
HEADING  DATA  WITH  "CDMVT"  FORMULA 

(f)  ADJUST  MA-1  COMPASS  NEEDLE  DEFLECTION 
WITH  PULL  TO  SET  CONTROL  IF  REQUIRED 

(g)  TROUBLESHOOT  SYSTEM  HEADING  IF  REQUIRED 

(h)  INSTRUCT  PILOT  TO  ADJUST  HEADING  SO  AS 
TO  MAINTAIN  PREPLANNED  COURSE 

(i)  INFORM  PILOT  OF  SYSTEM  HEADING  RESULTS 

59  MONITOR  SYSTEM  ALTITUDE 

(a)  READ  ALTITUDE  (AGL)  FROM  RADAR 
ALTIMETER 

(b)  READ  PRESSURE  ALTITUDE  (MSL)  FROM 
PRESSURE  ALTIMETER 

(c)  READ  SYSTEM  ALTITUDE  (MSL) 

(1)  THROW  DDU  DATA  SWITCH . PRES  POS 
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(2)  READ  ALTITUDE  IN  ALT  DISPLAY 


T8 


(d) 

READ  TERRAIN  ALTITUDE  FOR  PRESENT 
POSITION  ON  CHART 

(e) 

EVALUATE  SYSTEM  ALTITUDE  ACCURACY 

USING  ABOVE  SOURCES 

(f) 

TROUBLESHOOT  SYSTEM  ALTITUDE  IF 
REQUIRED 

(g) 

INSERT  CORRECT  ALTITUDE  IF  REQUIRED 

(h) 

INFORM  PILOT  OF  SYSTEM  ALTITUDE 

RESULTS 

S10 

MONITOR  SAFETY  OF  FLIGHT  INSTRUMENTS 

AND  EQUIPMENT 

(a) 

EVALUATE  FUEL  STATUS  BY  COMPARING 
ACTUAL  FUEL  REMAINING  WITH  PREPLANNED 
FUEL  REMAINING 

(b) 

INFORM  PILOT  OF  FUEL  STATUS 

(c) 

CHECK  ANNUNCIATOR  CAUTION  LIGHTS 

FOR  EMERGENCY  INDICATIONS 

(d) 

CHECK  FIRE  WARNING  LIGHTS  FOR  AIRCRAFT 
FIRE  INDICATIONS 

(e) 

CHECK  ACCESSIBLE  CIRCUIT  BREAKERS . 

(f) 

INFORM  PILOT  OF  SAFETY  OF  FLIGHT 
INSTRUMENTS  AND  EQUIPMENT  RESULTS 

PERFORM 

APPROACH  TO  TP  PROCEDURES 

Si 

CHECK  FLIGHT  PROGRESS  USING  RADAR  SIG¬ 
NIFICANT  TERRAIN/CULTURAL  FEATURES  AS 

CHECK  POINTS 

(a) 

COMPARE  RADAR  RETURN  IMAGE  ON  CURSOR 
INTERSECTION  WITH  PREPLANNED  CHART 

AND  QUICK  &  DIRTY 

(b) 

POSITION  CURSOR  INTERSECTION  ON  TP 

IF  REQUIRED 

(c) 

REPEAT  (a)  AND  (b)  IF  REQUIRED 

IN 
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(d)  PUSH  CORRECT  POS  BUTTON  ON  LOWER 
DDU  PANEL 

52  SELECT  ARE  30/60  DISPLAY  AT  APPROXIMATELY 
17  MILES  FROM  TP  AND  NAVIGATION  IS 
ACCURATE 

53  OBSERVE  ARE  30/60  EXPANDED  DISPLAY  AT 
APPROXIMATELY  17  MILES 

54  TUNE  RADAR  FOR  OPTIMUM  ARE  30/60  DISPLAY 


(a)  ROTATE  STC  SLOPE  CONTROL . CCW 

(b)  THROW  ANT  PATT  SWITCH  (CONTINUE 

UNTIL  BRIGHTEST  RETURN  PRESENT) . NEAR 

(c)  ADJUST  RCVR  CONTROL 

(UNTIL  RETURN  IS  ENHANCED) . CCW 

(d)  ADJUST  SCAN  ANGLE  CONTROL . CW/CCW 

(e)  CHECK  AZ-RNG  TRKG  SWITCH . OFF 

(f)  CHECK  ELEV  TRKG  SWITCH . OFF 

(g)  ADJUST  VIDEO/DIF  CONTROLS  TO 
ENHANCE  RETURN  RESOLUTION  IF 
REQUIRED 


S5  CONTINUE  POSITIONING  CURSOR  INTERSECTION 
ON  FRONT  LEADING  EDGE  CENTER  OF  TURN 
POINT  RETURN 

T9  PERFORM  VELOCITY  CORRECT  PROCEDURES 

51  GO  TO  T9.1  FOR  AUTOMATIC  VELOCITY  CORRECT 
(AZ-RANGE  LOCK-ON  OR  TRACK- WHILE- SCAN 
REQUIRED) 

52  GO  TO  T9. 2  FOR  MANUAL  VELOCITY  CORRECT 

53  FLIR  TRACKING  MANUAL  VELOCITY  CORRECT 
(NOT  PART  OF  SIMULATOR  CAPABILITY) 

T9.1  PERFORM  AUTOMATIC  VELOCITY  CORRECT 

SI  POSITION  AZIMUTH  CURSOR  TO  CENTER  OF  TP 
RETURN 
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S2  POSITION  RANGE  CURSOR  TO  JUST  LEADING 


EDGE  OF  TP  RETURN 

53  THROW  AZ-RNG  TRKG  SWITCH . ON 

54  CHECK  AZ- RANGE  INDICATOR  LIGHT . ON 

55  ROTATE  VELOCITY  CORRECT  SWITCH . MEMORY  POINT 

56  CHECK  DDU  DATA  SELECT  SWITCH . A 


SI  CHECK  FLT  DATA  DISPLAY  4  FOR  SOME 
VALUE  GREATER  THAN  000 

S8  ROTATE  VELOCITY  CORRECT  SWITCH 

(BEFORE  TP  WALK- DOWN) . OFF  SAVE 

T9. 2  PERFORM  MANUAL  VELOCITY  CORRECT 

51  POSITION  AZIMUTH  CURSOR  TO  CENTER  OF 
TP  RETURN 

52  POSITION  RANGE  CURSOR  TO  JUST  LEADING 


EDGE  OF  TP  RETURN 

53  CHECK  AZ-RNG  TRKG  SWITCH . OFF 

54  ROTATE  VELOCITY  CORRECT  SWITCH . MEMORY  POINT 


55  DELAY  FOR  10  SECOND  MINIMUM/128  SECOND 
MAXIMUM  TO  ALLOW  CURSOR  DRIFT 

56  REPEAT  POSITIONING  BEARING  AND  RANGE 
CURSORS  TO  JUST  LEADING  EDGE  CENTER 
OF  TP  RETURN 

SI  MONITOR  FURTHER  CURSOR  DRIFT  AND 
REPEAT  POSITIONING  IF  REQUIRED 

58  CHECK  DDU  DATA  SELECT  SWITCH . A 

59  CHECK  FLT  DATA  DISPLAY  4  FOR  SOME  VALUE 
GREATER  THAN  000 

S10  ROTATE  VELOCITY  CORRECT  SWITCH 

(BEFORE  TP  WALK- DOWN) . OFF  SAVE 

T10  INITIATE  TURN  AT  TP 

SI  ALERT  PILOT  OF  NEXT  OUTBOUND  HEADING 
ONE  MINUTE  PRIOR  TO  REACHING  TP 
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52  SET  OUTBOUND  HEADING  ON  HSI 

53  CHECK  AZ-RNG  TRKG  SWITCH . OFF 

54  MONITOR  DVRI  HEADING  BUG  FOR  MOVEMENT 
TO  180°  RELATIVE  POSITION  (TP  PASSAGE) 

55  RECORD  LEG  TIME  OR  TOTAL  ELAPSED  TIME 
AT  TP  PASSAGE 

56  ACTIVATE  COCKPIT  CLOCK  AT  TP  PASSAGE  IF 
REQUIRED  (LEG  TIME  ONLY) 

57  INFORM  PILOT  OF  TP  PASSAGE  AND  OUTBOUND 
HEADING 

58  CHECK  FOR  PILOT  TURNING  TO  NEW  HEADING 


For  subsequent  navigation  legs,  return  to  Segment  3,  Task  2 
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APPENDIX  B 


GLOSSARY  OF  TASK  SPECIFIC  BEHAVIORS 


The  enclosed  glossary  of  31  specific  behaviors,  or  action 
verbs,  was  excerpted  from  Oiler  [1968].  Each  verb  has  a 
specific  meaning  and  is  acceptable  in  the  sense  that  all 
synonyms  have  been  eliminated.  Each  action  verb  is  used  in 
the  task  listing  (Appendix  A),  task  analysis  (Appendix  C) ,  and 
the  MTLA  (Appendix  D) ,  for  the  purpose  of  defining  observable 
behavior  that  may  be  measured  in  terms  of  task  performance. 
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Activate  -  Provide  the  initial  force  or  action  to  begin  an 
operation  of  some  equipment  configuration. 

Adjust  -  Manipulate  controls,  levers,  linkages  and  other  equip¬ 
ment  items  to  return  equipment  from  an  out-of-tolerance  condi¬ 
tion  to  an  in-tolerance  condition. 

Alert  -  Inform  designated  persons  that  a  certain  condition 
exists  in  order  to  bring  them  up  to  a  watchful  state  in  which 
a  quick  reaction  is  possible. 

Check  -  Examine  to  determine  if  a  given  action  produces  a 
specified  result;  to  determine  that  a  presupposed  condition 
actually  exists,  or  to  confirm  or  determine  measurements  by 
the  use  of  visual,  auditory,  tactile,  or  mechanical  means. 

Checkout  -  Perform  routine  procedures,  which  are  discrete, 
ordered  stepwise  actions  designed  to  determine  the  status  or 
assess  the  performance  of  an  item. 

Compare  -  Examine  the  characteristics  of  two  or  more  items  to 
determine  their  similarities  and  differences. 

Continue  -  Proceed  in  the  performance  of  some  action,  procedure, 
etc.,  or  to  remain  on  the  same  course  or  direction  (  e.g., 
continue  to  check  the  temperature  fluctuations;  continue  to 
adjust  the  controls;  and  continue  on  the  same  heading) . 

Delay  -  Wait  a  brief  period  of  time  before  taking  a  certain 
action  or  making  a  response. 

Depress  -  Apply  manual  (as  opposed  to  automatic)  pressure  to 
activate  or  initiate  an  action  or  to  cause  an  item  of  equipment 
to  function  or  cease  to  function. 

Determine  -  Find,  discover,  or  detect  a  condition  (e.g.,  deter¬ 
mine  degree  of  angle  ). 
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Evaluate  -  Judge  or  appraise  the  worth  or  amount,  of  a  unit 
of  equipment,  operational  procedure  or  condition  (e.g., 
evaluate  status  of  life  support  systems) . 

Inform  -  Pass  on  information  in  some  appropriate  manner  to 
one  or  more  persons  about  a  condition,  event,  etc. ,  that  they 
should  be  aware  of. 

Initiate  -  Give  a  start  to  a  plan,  idea,  request,  or  some  form 
of  human  action  (e.g.,  initiate  a  new  safety  procedure). 

Insert  -  Place,  put,  or  thrust  something  within  an  existing 
context  (e.g.,  insert  a  part  in  the  equipment,  insert  a  request 
in  the  computer). 

Instruct  -  Impart  information  in  an  organized,  systematic  manner 
to  one  or  more  persons. 

Monitor  -  Observe  continually  or  periodically  visual  displays, 
or  listen  for  or  to  audio  displays,  or  vibrations  in  order  to 
determine  equipment  condition  or  operating  status. 

Observe  -  Note  the  presence  of  mechanical  motion,  the  condition 
of  an  indicator,  or  audio  display,  or  other  sources  of  movement 
or  audible  sounds  on  a  nonperiodic  basis. 

Perform  -  Carry  out  some  action  from  preparation  to  completion 
(It  is  understood  that  some  special  skill  or  knowledge  is 
required  to  successfully  accomplish  the  action.). 

Place  -  Transport  an  object  to  an  exact  location. 

Position  -  Turn,  slide,  rotate,  or  otherwise  move  a  switch, 
lever,  valve  handle,  or  similar  control  device  to  a  selected 
orientation  about  some  fixed  reference  point. 

Push  ~  Exert  a  force  on  an  object  in  such  a  manner  that  the 
object  will  move  or  tend  to  move  away  from  the  origin  of  the 
force. 

Read  “  Use  ones  eyes  to  comprehend  some  standardized  form  of 
visual  symbols  (e.g.,  sign,  gauge,  or  chart). 
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Record  -  Make  a  permanent  account  of  the  results  of  some 
action,  test,  event,  etc.,  so  that  the  authentic  evidence 
will  be  available  for  subsequent  examination. 

Repeat  -  Perform  the  same  series  of  tests,  operations,  etc., 
over  again,  or  perform  an  identical  series  of  tasks,  tests, 
operations,  etc. 

Rotate  -  Apply  manual  torque  to  cause  a  multiple  position 
rotary  switch  or  a  constantly  varying  device  like  a  handwheel, 
thumbwheel,  or  potentiometer  to  move  in  a  clockwise  or  counter¬ 
clockwise  manner. 

Select  -  Choose,  or  to  be  commanded  to  choose,  an  alternative 
from  among  a  series  of  similar  choices  (e.g.,  select  a  proper 
transmission  frequency) . 

Set  -  Move  pointers,  clock  hands,  etc.,  to  a  position  in  con¬ 
formity  with  a  standard,  or  place  mechanical  controls  in  a 
predetermined  position. 

Throw  -  Change  manually  the  setting  of  a  toggle  switch  from 
one  position  to  another. 

Troubleshoot  -  Examine  and  analyze  failure  reports,  equipment 
readouts,  test  equipment  meter  valves,  failure  symptoms,  etc., 
to  isolate  the  source  of  malfunction. 

Tune  -  Adjust  an  item  of  equipment  to  a  prescribed  operating 
condition. 

Use  -  Utilize  some  unit  of  equipment  or  operational  procedure. 
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APPENDIX  C 


A-6E  TRAM  RADAR  NAVIGATION  TASK  ANALYSIS 


The  purpose  of  performing  a  task  analysis  for  measuring 
B/N  performance  during  radar  navigation  was  to  provide  candi¬ 
date  performance  measure  metrics  that  may  describe  either 
successful  task  performance  or  B/N  skill  acquisition.  Only 
segment  three,  Navigation  to  TP,  was  examined  to  limit  the 
scope  of  the  task  analysis.  The  sequential  flow  of  tasks, 
subtasks,  and  subtask  elements  (defined  below) ,  is  the  same 
as  that  found  in  the  A-6E  TRAM  radar  navigation  task  listing 
(Appendix  A) .  The  seven  columns  of  the  task  analysis  form 
were  defined  in  Chapter  V  but  the  definitions  are  repeated 
here  for  the  convenience  of  the  reader: 

(1)  Subtask  -  a  component  activity  of  a  task.  Within  a  task 
collectively  all  subtasks  comprise  the  task.  Subtasks 
are  represented  by  the  letter  "S"  followed  immediately 
by  a  numeral.  Subtask  elements  are  represented  by  a 
small  letter  in  parentheses. 

(2)  Feedback  -  the  indication  of  adequacy  of  response  or 
action.  Listed  as  VISUAL,  TACTILE,  AUDITORY,  or 
VESTIBULAR  and  is  listed  in  the  subtask  column  for  con¬ 
venience  only. 

(3)  Action  Stimulus  -  the  event  or  cue  that  instigates  per¬ 
formance  of  the  subtask.  This  stimulus  may  be  an  out-of 
tolerance  display  indication,  a  requirement  of  periodic 
inspection,  a  command,  a  failure,  etc. 


240 


(4)  Time  -  the  estimated  time  in  seconds  to  perform  the 
subtask  or  task  element  calculated  from  initiation  to 
completion. 

(5)  Criticality  -  the  relationship  between  mission  success 
and  the  below-minimum  performance  or  required  excessive 
performance  time  of  a  particular  subtask  or  subtask 
element.  "High"  (H)  indicates  poor  subtask  performance 
may  lead  to  mission  failure  or  an  accident.  "Medium" 

(M)  indicates  the  possibility  of  degraded  mission  capa¬ 
bility.  "Low"  (L)  indicates  that  poor  performance  may 
have  little  effect  on  mission  success. 

(6)  Potential  Error  -  errors  are  classified  as  failure  to 
perform  the  task  (OMIT) ,  performing  the  task  inappropri¬ 
ately  in  time  or  accuracy  (COMMIT) ,  or  performing  sequential 
task  steps  in  the  incorrect  order  (SEQUENTIAL) . 

(7)  Skills  Required  -  the  taxonomy  of  training  objectives  used 
for  the  Grumman  task  analysis  was  retained  and  presented 
in  Table  VII  [Campbell,  et  al.,  1977]. 

(8)  Performance  Measure  Metrics  -  a  candidate  metric  which 
may  best  describe  the  successful  performance  of  the  task 
or  a  genuine  display  of  the  required  skills.  The  types 
of  metrics  suggested  were  classified  as:  TIME  (time  in 
seconds  from  start  to  finish  of  task) ,  T-S  (time-sharing 
or  proportion  of  time  that  particular  task  is  performed 
in  relation  to  other  tasks  being  performed  in  the  same 
time  period) ,  R-T  (reaction  time  in  seconds  from  the  onset 
of  an  action  stimulus  to  task  initiation) ,  ACC  (accuracy 
of  task  performance) ,  FREQ  (number  of  task  occurrences) , 

DEC  (decisions  made  as  a  correct  or  -incorrect  choice 
depending  on  the  particular  situation  and  mission  require¬ 
ments)  ,  QUAL  (quality  of  a  task,  especially  in  regards  to 
radar  scope  tuning  quality) ,  and  SUBJ  (subjective  observa¬ 
tion  or  comprehension  of  the  task  execution  success  by  an 
instructor) . 
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The  right-hand  column  of  the  task  analysis  form  ("perfor¬ 
mance  measure  metrics")  provided  several  hundred  possible 
candidate  measures  for  describing  successful  task  performance 
or  B/N  skill  acquisition.  Using  initial  measure  selection 
criteria  as  outlined  in  Chapter  IV,  these  measures  were  reduced 
and  combined  with  literature  review  candidate  measures  (Table 
II  of  Chapter  IV)  to  produce  the  final  candidate  measure  set 
as  shown  in  Table  XI  of  Chapter  VII. 
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APPENDIX  D 


RADAR  NAVIGATION  MISSION  TIME  LINE  ANALYSIS 


The  Mission  Tine  Line  Analysis  (MTLA)  relates  the  sequence 
of  tasks  to  be  performed  by  the  operator  to  a  real  time  basis, 
and  can  be  used  to  identify  critical  tasks  within  a  maneuver 
that  are  important  for  performance  measurement  [Matheny,  et 
al.,  1970],  Using  the  segment  three  portion  of  the  A-6E  TRAM 
radar  navigation  task  listing  (Appendix  A),  each  task/ subtask 
was  listed  along  the  vertical  axis  of  the  time  line.  The 
estimated  time  to  perform  each  task  and  subtask  was  then  ex¬ 
tracted  from  the  task  analysis  (Appendix  C)  and  plotted  along 
the  horizontal  axis,  which  represents  in  this  example  a  seven- 
minute  radar  navigation  TP-to-TP  "leg."  Time  is  coded  as: 

(1)  dark  if  the  task  must  be  executed  for  maneuver  Success  or 
if  the  task  requires  complete  operator  attention,  or  (2)  shaded 
if  the  task  is  one  of  monitoring  or  troubleshooting  and  can  be 
performed  simultaneously  with  other  tasks. 

The  MTLA  is  a  large  graph  but  is  presented  here  as  two 
task  pages  (Tl  to  T7,  S3;  and  T7 ,  S9  to  T10)  each  followed  by 
two  time  pages  (0  to  3+30,  and  3+30  to  7+00) .  By  removing 
and  appropriately  arranging  the  six  pages,  the  full  MTLA  graph 
will  result. 
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SEGMENT  3:  NAVIGATION  TO  TURN  POINT  (TP) 

-0+30  0 

T1 

INITIATE  TURN  AT  IP 

SI 

ALERT  PILOT  OF  OUTBOUND  HEADING 

S2 

SET  HEADING  ON  HSI 

— 

S3 

CHECK  AZ-RNG  TRKG  SWITCH 

S4 

MONITOR  DVRI  FOR  IP  PASSAGE 

////// 

//////// 

S5 

ACTIVATE  COCKPIT  CLOCK 

S6 

INFORM  PILOT  OF  IP  PASSAGE 

S7 

CHECK  FOR  PILOT  TURNING 

T2 

ACTIVATE  STEERING  TO  TP 

SI 

DEPRESS  TGT  N  ADDRESS  KEY 

S2 

CHECK  COMPTMODE  SWITCH 

T3 

CHECK  FOR  ACCURATE  STEERING 

Si 

READ  SYSTEM  BEARING  AND  RANGE 

S2 

COMPARE  BEARING  AND  RANGES 

T4 

TROUBLESHOOT  STEERING  IF  REQUIRED 

SI 

DETERMINE  ACTUAL  TP  LAT/LONG 

S2 

COMPARE  ACTUAL/ SYSTEM  TP 

S3 

INSERT  CORRECT  TP  LAT/LONG 

S4 

EVALUATE  SYSTEM  PRESENT  POSITION 

T5 

INFORM  PILOT  OF  STEERING  TO  TP 

T6 

INSERT  DATA  FOR  NEXT  TP(s) 

SI 

THROW  COMPTMODE  SWITCH 

S2 

DEPRESS  OLD  TGT  N  ADDRESS  KEY 

S3 

DEPRESS  POS  ACTION  KEY 

S4 

DEPRESS  QUANTITY  KEYS 

S5 

DEPRESS  ALT  ACTION  KEY 

S6 

DEPRESS  QUANTITY  KEYS 

S7 

ALERT  PILOT  TO  MAINTAIN  HEADING 

S8 

THROW  COMPTMODE  SWITCH 

S9 

THROW  DDU  DATA  SWITCH 

S10 

CHECK  FOR  ACCURATE  DATA  ENTRY 

Sll 

REPEAT  INSERT  IF  REQUIRED 

S12 

DEPRESS  TGT  N  ADDRESS  KEY  FOR  TP 

S13 

CHECK  FOR  ACCURATE  STEERING 

S14 

REPEAT  S12/S13  IF  REQUIRED 

S15 

INFORM  PILOT  OF  STEERING  TO  TP 

T7 

PERFORM  SYSTEM  NAVIGATION  TASKS 

SI 

TUNE  RADAR  FOR  OPTIMUM  DISPLAY 

S2 

MONITOR  FLIGHT  PROGRESS 

S3 

INFORM  PILOT  OF  NAV  ACCURACY 

S4 

POSITION  CURSORS  ON  TP 

S5 

MONITOR  NAVIGATION  EQUIPMENT 

S6 

MONITOR  TIME  ON  TP 

S7 

MONITOR  SYSTEM  VELOCITIES 

S8 

MONITOR  SYSTEM  HEADING 
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SEGMENT  3:  (continued) 


T7  (continued) 

S9  MONITOR  SYSTEM  ALTITUDE 
S10  MONITOR  FLIGHT  SAFETY  INSTR. 

T8  PERFORM  APPROACH  TO  TP  PROCEDURES 

51  CHECK  FLIGHT  PROGRESS 

52  SELECT  ARE  30/60  DISPLAY 

53  OBSERVE  ARE  30/60  DISPLAY 

54  TUNE  RADAR  FOR  OPTIMUM  DISPLAY 

55  CONTINUE  CURSOR  POSITIONING 
T9.1  PERFORM  AUTOMATIC  VELOCITY  CORRECT 

51  POSITION  AZIMUTH  CURSOR 

52  POSITION  RANGE  CURSOR 

53  THROW  AZ-RNG  TRKG  SWITCH 

54  CHECK  FOR  AZ-RNG  LOCK-ON 

55  ROTATE  VELOCITY  CORRECT  SWITCH 

56  CHECK  DDU  DATA  SELECT  SWITCH 

57  CHECK  A- 4  DISPLAY 

58  ROTATE  VELOCITY  CORRECT  SWITCH 
T9.1  PERFORM  MANUAL  VELOCITY  CORRECT 

51  POSITION  AZIMUTH  CURSOR 

52  POSITION  RANGE  CURSOR 

53  CHECK  AZ-RNG  TRKG  SWITCH 

54  ROTATE  VELOCITY  CORRECT  SWITCH 

55  DELAY  FOR  10- SEC  MINIMUM 

56  REPEAT  CURSOR  POSITIONING 

57  MONITOR  FURTHER  CURSOR  DRIFT 

58  CHECK  DDU  DATA  SELECT  SWITCH 

59  CHECK  A- 4  DISPLAY 

S10  ROTATE  VELOCITY  CORRECT  SWITCH 
T10  INITIATE  TURN  AT  TP 

51  ALERT  PILOT  OF  OUTBOUND  HEADING 

52  SET  HEADING  ON  HSI 

53  CHECK  AZ-RNG  TRKG  SWITCH 

54  MONITOR  DVRI  FOR  TP  PASSAGE 

55  RECORD  TIME  OF  TP  PASSAGE 

56  ACTIVATE  COCKPIT  CLOCK 

57  INFORM  PILOT  OF  TP  PASSAGE 
RETURN  TO  SEGMENT  3,  TASK  2 
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APPENDIX  E 


SEQUENTIAL  SAMPLING  DECISION  MODEL 

This  appendix  presents  the  sequential  sampling  decision 
model  and  its  parameters  in  sufficient  detail  for  the  reader 
unfamiliar  with  the  background  theory  of  the  model.  Much  of 
the  material  is  excerpted  from  Rankin  and  McDaniel  [1980]  in 
a  method  proposal  for  achieving  improvements  in  the  precision 
of  determining  FRS  student  aviator  proficiency  using  a  Com¬ 
puter  Aided  Training  Evaluation  and  Scheduling  (CATES)  system. 
CATES  provides  a  computer  managed,  prescriptive  training  pro¬ 
gram  based  on  individual  student  performance,  and  could  be 
utilized  for  the  evaluation  portion  of  the  model  to  measure 
B/N  performance  by  either  simulator  software  incorporation  or 
by  desk-top  minicomputers. 


287 


I.  CATES  DECISION  MODEL 

One  sequential  method  that  may  be  used  as  a  means  for 
making  statistical  decisions  with  a  minimum  sample  was  intro¬ 
duced  by  Wald  [1947].  Probability  ratio  tests  and  correspond¬ 
ing  sequential  procedures  were  developed  for  several  statistical 
distributions.  One  of  the  tests,  the  binomial  probability 
ratio  test,  was  formulated  in  a  context  of  a  sampling  procedure 
to  determine  whether  a  collection  of  a  manufactured  product 
should  be  rejected  because  the  proportion  of  defectives  is  too 
high  or  should  be  accepted  because  the  proportion  of  the  defec¬ 
tives  is  below  an  acceptable  level.  The  sequential  testing 
procedure  also  provides  for  a  postponement  of  decisions  con¬ 
cerning  acceptance  or  rejection.  This  deferred  decision  is 
based  on  prescribed  values  of  alpha  (a)  and  beta  (6) .  Alpha 
(a)  limits  errors  of  declaring  something  "True"  when  is  is 
"False"  (Type  I  error) .  Beta  (6)  limits  errors  of  declaring 
something  "False"  when  it  is  "True"  (Type  II  error) . 

In  an  industrial  quality  control  setting,  the  inspector 
needs  a  chart  similar  to  Figure  El  to  perform  a  sequential 
test  to  determine  if  a  manufacturing  process  has  turned  out  a 
lot  with  too  many  defective  items  or  whether  the  proportion 
of  defects  is  acceptable.  As  each  item  is  observed,  the  in¬ 
spector  plots  a  point  on  the  chart  one  unit  to  the  right  if 
it  is  not  defective,  one  unit  to  the  right  and  one  unit  up  if 
the  item  is  defective.  If  the  plotted  line  crosses  the  upper 
parallel  line,  the  inspector  will  reject  the  production  lot. 
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If  the  plotted  line  crosses  the  lower  parallel  line,  the  lot 
will  be  accepted.  If  the  plotted  line  remains  between  the 
two  parallel  lines  of  the  sequential  decision  chart,  another 
sample  item  will  be  drawn  and  observed/tested. 

The  CATES  decision  model  focuses  on  proportions  of  profi¬ 
cient  trials  (analogous  to  nondefectives  or  correct  responses) 
whereas,  in  previous  applications,  proportions  of  defectives 
or  incorrect  responses  were  the  items  of  interest.  This  ap¬ 
proach  does  not  alter  the  logic  of  the  sequential  sampling 
procedure  or  the  decision  model.  It  does  enhance  the  "mean¬ 
ingfulness"  of  the  procedure  in  decisions  concerning  proficiency 
because  the  ultimate  goal  is  to  determine  "proficiency"  rather 
than  "nonproficiency."  It  should  be  noted  that  in  the  industrial 
quality  control  setting,  sampling  occurs  after  the  manufacturing 
process.  In  educational  and  training  applications,  sequential 
sampling  occurred  after  the  learning  period.  In  the  CATES 
System,  the  sequential  sampling  occurs  during  the  learning 
period  and  eventually  terminates  it. 

The  CATES  decision  model  can  be  described  as  consisting  of 
decision  boundaries.  Referring  to  Figure  El,  the  parallel 
lines  represent  those  decision  boundaries.  Crossing  the  upper 
line,  or  boundary,  results  in  a  decision  to  "Reject  Lot" ; 
crossing  the  lower  line,  or  boundary,  results  in  a  decision  to 
"Accept  Lot."  In  the  CATES  system,  these  decision  boundaries 
translate  to  "Proficient"  and  "Not  Proficient."  Calculations 
of  the  decision  boundaries  require  four  parameters.  These  four 
parameters  are: 
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Figure  El.  Hypothetical  Sequential  Sampling  Chart. 


P  Lowest  acceptable  proportion  of  proficient  trials 

(P)  required  to  pass  the  NATOPS  flight  evaluation 
with  a  grade  of  "Qualified."  Passage  of  the 
NATOPS  flight  evaluation  is  required  to  be  con¬ 
sidered  a  trained  aviator  in  an  operational  (fleet) 
squadron. 

P_  Acceptable  proportion  of  proficient  trials  (P) 

that  represent  desirable  performance  on  the  NATOPS 
flight  evaluation. 

Alpha  (a)  The  probability  of  making  a  TYPE  I  decision  error 
(deciding  a  student  is  proficient  when  in  fact  he 
is  not  proficient) . 

Beta  ($)  The  probability  of  making  a  TYPE  II  decision  error 
(deciding  a  student  is  not  proficient  when  in  fact 
he  is  proficient. 

Parameter  setting  is  a  crucial  element  in  the  development 
of  the  sequential  sampling  decision  model.  Kalisch  [1980] 
outlines  three  methods  for  selecting  prof icient/not  proficient 


performance  (q^/q^  values)  as: 

Method  1 — External  Criterion.  Individuals  are  classified 
as  masters,  non-masters,  or  unknown  on  the  basis  of  per¬ 
formance  on  criteria  directly  related  to  the  instructional 
objectives.  These  criteria  can  be  in  terms  of  demonstrated 
levels  of  proficiency  either  on  the  job  or  in  a  training 
environment.  The  mean  proportion  of  items  answered  cor¬ 
rectly  by  the  masters  on  an  objective  would  provide  an 
estimate  for  qQ.  Similarly,  q^  would  be  the  proportion 
correct  for  the  non-masters. 

Method  2 — Rationalization.  Experts  in  the  subject  area 
who  understand  the  relation  of  the  training  objectives 
to  the  end  result;  e.g.,  on-the-job  performance,  select 
the  qQ  and  q.  values  to  reflect  their  estimation  of  the 
necessary  levels  of  performance.  This  method  is  proba¬ 
bly  the  closest  to  that  now  used  by  the  Air  Force.  The 
procedure  may  provide  somewhat  easier  decision  making 
since  specifying  two  values  creates  an  indecision  zone — 
neither  mastery  nor  non-mastery.  This  indecision  zone 
indicates  that  performance  is  at  a  level  which  may  not 
be  mastery  but  is  not  sufficiently  poor  to  be  considered 
at  a  non-mastery  level. 
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Method  3 — Representative  Sample.  The  scores  of  prior 
trainees,  who  demonstrate  the  entire  range  from  extremely 
poor  to  exemplary  performance  on  objectives,  are  used 
to  estimate  and  q..  The  proportion  correct  for  the 
entire  sample  is  usea  to  obtain  an  initial  cutting  score 
C.  Scores  are  separated  into  two  categories:  (a)  those 

scores  greater  than  or  equal  to  C  and  (b)  those  scores 
less  than  C.  For  each  category,  the  mean  proportion 
correct  score  is  computed.  The  mean  for  the  first  cat¬ 
egory  equals  qg,*  the  mean  for  the  second  category  equals 

ql* 

The  selection  of  alpha  (a)  and  beta  (0)  should  be  based 
on  the  criticality  of  accurate  proficiency  decisions.  Small 
values  of  alpha  (a)  and  beta  (3)  require  additional  task  trials 
to  make  decisions  with  greater  confidence.  Factors  that  are 
important  in  selecting  values  for  alpha  (a)  and  beta  (3)  are 
outlined  below: 

(1)  Alpha  (a)  values 

(a)  Safety — potential  harm  to  the  trainee  or  to 
others  due  to  the  trainee's  actual  non-mastery 
of  the  task. 

(b)  Prerequisite  in  Instruction--potential  problems 
in  future  instruction,  especially  if  the  task 
is  prerequisite  to  other  tasks. 

(c)  Time/Cost — potential  loss  or  destruction  of 
equipment  either  in  training  or  upon  fleet 
assignment. 

(d)  Trainee's  View  of  the  Training — potential  neg¬ 
ative  view  by  trainee  when  classified  as  pro¬ 
ficient  although  the  trainee  lacks  confidence 
in  that  decision.  Also,  after  fleet  assignment 
if  previous  training  has  not  prepared  him 
sufficiently  the  trainee  may  also  have  a  negative 
view  of  the  training  program. 

(.2)  Beta  (3)  values 

Ca)  Instruction — requirement  for  additional  training 
resources  (personnel  and  materials)  for  unneces¬ 
sary  training  in  case  of  misclassif ication  as 
not  proficient. 
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(b)  Trainee  Attitudes — the  attitude  of  trainees 
when  tasks  have  been  mastered  yet  training 
continues;  trainee  frustration;  corresponding 
impact  on  performance  in  the  remainder  of  the 
training  program  and  fleet  assignment. 

(c)  Cost/Time — the  additional  cost  and  time  required 
for  additional  training  that  is  not  really  needed. 

After  the  model  parameters  have  been  selected,  calculation 
of  the  decision  boundaries  may  be  accomplished  using  the  Wald 
Binomial  Probability  Ratio  Test.  A  formal  mathematical  dis¬ 
cussion  of  this  test  follows. 

II.  WALD  BINOMIAL  PROBABILITY  RATIO  TEST 

The  Wald  binomial  probability  ratio  test  was  developed  by 
Wald  [1947]  as  a  means  of  making  statistical  decisions  using 
as  limited  a  sample  as  possible.  The  procedure  involves  the 
consideration  of  two  hypotheses: 

Ho:  p  S  P1 

and  H^:  P  >  V ^  where 

P  is  the  proportion  of  nondefectives  in  the  collection  under 
consideration,  P^  is  the  minimum  proportion  of  nondefectives 
at  or  below  which  the  collection  is  rejected,  and  i-s  the 
desired  proportion  of  nondefectives,  at  or  above  which  the 
collection  is  accepted.  Since  a  simple  hypothesis  is  being 
tested  against  a  simple  alternative,  the  basis  for  deciding 
between  Hq  and  may  be  tested  using  the  likelihood  ratio: 
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(p2) 


dn 


(P1> 


dn 


(i  -  V 

(i  -  p1) 


n-dn 


n-dn 


Where:  P,  =  Minimum  proportion  of  nondefectives  at  or  below 

which  the  collection  is  rejected. 

P 2  =  Desirable  proportion  of  nondefectives  at  or 
above  which  the  collection  is  accepted. 

n  =  Total  items  in  collection. 

dn  =  Total  nondefectives  in  collection. 


The  sequential  testing  procedure  provides  for  a  postpone¬ 
ment  region  based  on  prescribed  values  of  alpha  (a)  and  beta 
(8)  that  approximate  the  two  types  of  errors  found  in  the 
statistical  decision  process.  To  test  the  hypothesis 
Hq:  P  =  P^,  calculate  the  likelihood  ratio  and  proceed  as 

follows : 


(1) 

(2) 

(3) 


p 

If  -  T^a  '  accePt  hq 


If  J -  ,  accept  H 

Pln  "  a  1 

P 

B  2n  l- a 

If  ■=-=—  <  ^ ^  /  take  an  additional  observation, 
1-0  Pln  a 


These  three  decisions  relate  well  to  the  task  proficiency 
problem.  We  may  use  the  following  rules: 

Cl)  Accept  the  hypothesis  that  the  grade  of  P  is  accumu¬ 
lated  in  lower  proportions  than  acceptable  performance  would 
indicate. 
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(2)  Reject  the  hypothesis  that  the  grade  of  P  is  accumu¬ 
lated  in  lower  proportions  than  acceptable  performance  would 


indicate.  By  rejecting  this  hypothesis,  an  alternative 
hypothesis  is  accepted  that  the  grade  of  P  is  accumulated  in 
proportions  equal  to  or  greater  than  desired  performance. 

(3)  Continue  training  by  taking  an  additional  trial (s); 
a  decision  cannot  be  made  with  specified  confidence. 

The  following  equations  are  used  to  calculate  the  decision 


regions  of  the  sequential  sampling  decision  model. 


log  T§- 


dn  ■ 


a 


1-P. 


!°g  ^ 


1-P. 


P2  l-pi 

log  =-  +  log 

?i  i  r2 


+  n 


P2  1_P1 

log  =-  +  log 

*1  X  2 


log 


1-B 

a 


dn  : 


1-P. 


P  1-P 

2  A  rl 

log  —  +  log  jrp- 


+  n 


P2  1_P1 

log  ^  +  log 

*1  X  2 


Where:  dn  =  Accumulation  of  trials  graded  as  "P"  in  the 

sequence. 

n  -  Total  trials  presented  in  the  sequence. 

P.  =  Lowest  acceptable  proportion  of  proficient  trials 
(P)  required  to  pass  the  NATOPS  flight  evaluation 
with  a  grade  of  "Qualified." 

P2  -  Proportion  of  proficient  trials  (P)  that  represent 
desirable  performance  on  the  NATOPS  flight  eval¬ 
uation. 

Alpha  (a)=  The  probability  of  making  a  type  I  error  (deciding 
a  student  is  proficient  when  in  fact  he  is  not 
proficient . 
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Beta  (6)  =  The  probability  of  making  a  type  II  error 
(deciding  a  student  is  not  proficient  when 
in  fact  he  is  proficient) . 

The  first  term  of  the  two  equations  will  determine  the 
intercepts  of  the  two  linear  equations.  The  width  between 
these  intercepts  is  determined  largely  by  values  selected  for 
alpha  (a)  and  beta  (0) .  The  width  between  the  intercepts 
translates  into  a  region  of  uncertainty;  thus,  as  lower  values 
of  alpha  (a)  and  beta  (0)  are  selected  this  region  of  uncer¬ 
tainty  increases. 

The  second  term  of  the  equations  determines  the  slopes 
of  the  linear  equation.  Since  the  second  term  is  the  same 
for  both  equations,  the  result  will  be  slopes  with  parallel 
lines.  Values  of  and  P2  as  well  as  differences  between 
P^  and  P2  affect  the  slope  of  the  lines.  This  is  easily 
translated  into  task  difficulty.  As  P2  values  increase,  in¬ 
dicating  easier  tasks,  the  slope  becomes  more  steep.  This  in 
turn  results  in  fewer  trials  required  in  the  sample  to  reach 
a  decision. 

As  differences  in  P^  and  P2  increase,  the  slope  also  be¬ 
comes  steeper  and  the  uncertainty  region  decreases.  This  is 
consonant  with  rational  decision  making.  When  the  difference 
between  the  lower  level  of  proficiency  and  upper  level  of 
proficiency  is  great,  it  is  easier  to  determine  at  which  pro¬ 
ficiency  level  the  pilot  trainee  is  performing.  The  concept 
of  differences  in  P^  and  P2  is  analogous  to  the  concept  of 
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effect  size  in  statistically  testing  the  difference  between 
the  means  of  two  groups.  In  such  statistical  testing,  when 
alpha  (a)  and  beta  (3)  remain  constant,  the  number  of  obser¬ 
vations  required  to  detect  a  significant  difference  may  be 
reduced  as  the  anticipated  effect  size  increases  [Kalisch, 
1980] . 
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